Object storage on Mastodon with a Backblaze B2 bucket

A little background & why B2?

It was kind of background knowledge for me that object storage was an option in Mastodon hosting, but I never felt much need for it on my tiny instance. Then it became an urgent issue when fedi activity exploded in November in the wake of the Twitter meltdown. My instance database started crashing from the 40 GB local drive overflowing with cached media, and constantly ran at above 30 GB even when I left only one day’s worth of cache at media cleanup (tootctl media remove --days 1).

I already had a Backblaze/B2 account that I had been using for my personal offsite backups, and I calculated that I could similarly hook it up to my Hometown/Mastodon instance at a fraction of the cost of adding more storage volume to the Hetzner server. I also wanted to keep using B2 for this rather than create a new account with AWS or some other storage service, feeling no need to complicate things with yet another account and service to keep track of.

The problem I ran into was that this particular combination of Mastodon and B2 is woefully underdocumented, even with B2’s S3-compatibility. This led me into a lot of trial and error because the documentation I did find was outdated¹ and/or did not mention issues unique to B2, like a huge authentication pitfall that I ended up pitching headfirst into.

Let me discuss that pitfall first, in case you don’t need the rest of this guide: You need to use a B2 application key, NOT a master application key, for this purpose. If, like me, you have everything set up correctly and media uploads fail for an unexplained reason, this might be why. More details in Step 1 below.

So here it is, the walkthrough of the process and settings that I wished I had when I configured my setup, put together from other sources and my own trial and error.

Step 1: Set up a B2 bucket and application key for your instance

This part is going to be pretty obvious if you already use B2. Otherwise, the official tutorial for creating a bucket should be enough. Everything I have read says the privacy setting of the bucket should be public, though this comes at a risk because it means anyone can download from the bucket which could potentially eat into your traffic limit and cost you. If you haven’t done so already, you might have to verify your email to set the bucket to public.

Make note of the bucket’s address, which will be the endpoint noted in your bucket information preceded by your bucket name. If you named your bucket my-instance-media your bucket address would be something like:

my-instance-media.s3.us-west-900.backblazeb2.com

You can verify this by uploading a file to the bucket and viewing the address of the file, which will be something like:

my-instance-media.s3.us-west-900.backblazeb2.com/my-test-file.txt

If you don’t have a B2 application key, their official tutorial should get you started. Also, as discussed above, make sure you use a non-master application key pair for this setup in Step 4 below. Master application keys are NOT S3-compatible (see “Warning”), and if you set up your .env.production with it your setup will not work!

Note down the application key id-application key pair in a secure location such as your password manager, especially the application key which will only be shown once and never again in your browser interface or otherwise.

Step 2: Set up a proxy on nginx

As noted in the official Mastodon documentation, it is very much recommended that you set up a proxy local to the server to cache media requested from the bucket on your server. If every request were to go directly to your bucket your traffic meter could climb rapidly and cost you more money than it has to. I modeled my nginx configuration for this on a configuration for a different S3-compatible service, and followed the directions in the Mastodon documentation on configuring the proxy.

Here’s what my configuration, anonymized to /etc/nginx/sites-available/files.example.com, looks like:

    proxy_cache_path   /tmp/nginx-cache-instance-media levels=1:2 keys_zone=s3_cache:10m max_size=10g
    inactive=48h use_temp_path=off;

    server {

      listen 443 ssl http2;
      listen [::]:443 ssl http2;
      # CUSTOMIZE THE VALUE BELOW TO YOUR OWN SUBDOMAIN
      server_name files.example.com;

      root /home/mastodon/live/public/system;

      access_log off;
      # CUSTOMIZE THE VALUE BELOW TO YOUR DESIRED ERROR LOG FILE NAME
      error_log /var/log/nginx/files-error.log;

      keepalive_timeout 60;

      location = / {
        index index.html;
      }

      location / {
        try_files $uri @s3;
      }

      # CUSTOMIZE THE VALUE BELOW TO YOUR BUCKET ADDRESS
      set $s3_backend 'https://my-instance-media.s3.us-west-900.backblazeb2.com';

 location @s3 {
   limit_except GET {
     deny all;
   }

   resolver 9.9.9.9;
   # CUSTOMIZE THE VALUE BELOW TO YOUR BUCKET ADDRESS
   proxy_set_header Host 'my-instance-media.s3.us-west-900.backblazeb2.com';
   proxy_set_header Connection '';
   proxy_set_header Authorization '';
   proxy_hide_header Set-Cookie;
   proxy_hide_header 'Access-Control-Allow-Origin';
   proxy_hide_header 'Access-Control-Allow-Methods';
   proxy_hide_header 'Access-Control-Allow-Headers';
   proxy_hide_header x-amz-id-2;
   proxy_hide_header x-amz-request-id;
   proxy_hide_header x-amz-meta-server-side-encryption;
   proxy_hide_header x-amz-server-side-encryption;
   proxy_hide_header x-amz-bucket-region;
   proxy_hide_header x-amzn-requestid;
   proxy_ignore_headers Set-Cookie;
   proxy_pass $s3_backend$uri;
   proxy_intercept_errors off;

   proxy_cache s3_cache;
   proxy_cache_valid 200 304 48h;
   proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
   proxy_cache_lock on;

   expires 1y;
   add_header Cache-Control public;
   add_header 'Access-Control-Allow-Origin' '*';
   add_header X-Cache-Status $upstream_cache_status;
  }

}

The specific addresses and names should be customized to your desired settings, as marked in the configuration text.

When the configuration file is written to your satisfaction, save it and symlink it from /etc/nginx/sites-enabled, and reload nginx by running (with sudo if you are not the root user here):

ln -s /etc/nginx/sites-available/files.example.com /etc/nginx/sites-enabled/
systemctl reload nginx

Then get a SSL certificate for the domain, as seen in the Mastodon documentation.

certbot --nginx -d files.example.com
systemctl reload nginx

This was the main place I diverged from the configuration posted on the thomas-leister.de website, by the way: I use port 443 for an encrypted connection per the Mastodon documentation rather than 80 for an unencrypted one like Thomas Leister did, mainly because the unencrypted connection broke all the images on my instance lol.

Step 3: Upload existing Mastodon media to your bucket

If your instance is already in use, you should upload previously downloaded media to the instance bucket. There are several different tools to achieve this, and if you already use an S3-compatible tool like aws or s3cmd it should do the job. Just be aware that you’ll need to use an S3-compatible non-master B2 application key to authenticate it, as discussed.

I used the official b2 command line tool, since it’s a simple binary and fairly easy to use. I downloaded b2 for Linux through the link on this page, uploaded it to the /home/mastodon/live directory (though in hindsight its bin subdirectory would have been more fitting), changed the owner to the mastodon user with:

sudo chown mastodon:mastodon b2-linux

Switched to the mastodon user:

sudo su - mastodon

Changed the file name to b2 for simplicity’s sake:

mv b2-linux b2

Also gave it execution permission.

chmod +x b2

I didn’t mess with $PATH or anything like that, since this wasn’t going to be an everyday operation.

You can then create an authentication profile using the application key ID and application key pair generated in Step 1 above.

./b2 authorize-account --profile my-instance $B2_Application_Key_ID $B2_Application_Key

The variables $B2_Application_Key_ID and $B2_Application_Key should be replaced by the actual values, of course. Or you can actually define the variables I guess, but I didn’t feel the need since authentication was a one-time thing and, once successful, the switch --profile my-instance is enough to authenticate all operations.

After setting up the profile with authorize-account you can use some short, harmless command like list-buckets to test whether authentication works:

./b2 list-buckets --profile my-instance

Or maybe try uploading a small file or something. The --help switch is helpful for figuring out the commands and syntax, or simply running b2 without any arguments will also bring up the help options.

Once authentication is confirmed to work, sync the public/system directory to the remote b2 bucket using the sync command. If you haven’t already, it’s a good idea to run some media cleanup commands to minimize the amount of files to upload to the bucket. Here are the ones I used, from /home/mastodon/bin:

  ./tootctl media remove --days 1
./tootctl media remove --prune-profiles
./tootctl media remove --remove-headers

When you are ready to start moving the files, assuming the command is run from the /home/mastodon/live directory:

./b2 sync --profile my-instance ./public/system/ b2://my-instance-media/

You can read more about b2’s sync command options, but I found the default options satisfactory.

Step 4: Mastodon configuration

My Mastodon configuration in live/.env.production to enable the object storage looks something like this:

S3_ENABLED=true
S3_PROTOCOL=https
# EVERYTHING BELOW THIS POINT SHOULD BE CUSTOMIZED
S3_BUCKET=my-instance-media
AWS_ACCESS_KEY_ID=$B2_Application_Key_ID
AWS_SECRET_ACCESS_KEY=$B2_Application_Key
S3_ALIAS_HOST=files.example.com
S3_HOSTNAME=files.example.com
S3_REGION=us-west-900
S3_ENDPOINT=https://s3.us-west-900.backblazeb2.com

In addition to the earlier point that the application key ID and application key pair should have been generated as a non-master application key, also note the https:// in front of the S3_ENDPOINT value. For me that was the final hurdle to getting the setup to work.

Switch to admin or some other user with sudo power. From the mastodon user, it just takes an exit command in my case. Restart the Mastodon processes:

sudo systemctl restart mastodon-*.service

Check if the instance works normally. If it’s down, the API call to Backblaze storage may be failing and the key id and application key values should be double-checked.

Step 5: Check if object storage is working

As discussed in the Thomas Leister writeup (“Checking if it works”), check the browser’s console to see if the correct server proxy is loading up for media, and whether media are properly displayed.

Also, try attaching a piece of media to a post. If the attachment fails with a 500 error, you need to check your settings.

Even after I ironed out the authentication issues with the application key I found media uploads were, understandably, slower than before and they sometimes timed out. This was why I set the keepalive_timeout value to 60 rather than 30 in the nginx proxy settings and image uploads have not timed out since.

Though Mastodon will be uploading new media to the remote bucket and requesting it remotely, for preexisting media files it will look to the local public/system directory first. This can make it harder to tell if the bucket setup is working or it’s just the local storage doing the work, so if you’re impatient you can get rid of that directory to force the instance to load everything from the bucket instead. From /home/mastodon/live/public you can run:

mv system/ system_/

to change the name of the system directory without immediately deleting everything in it.

You can’t simply leave public/system missing, though, if you want to keep the nginx proxy settings as they are. Guess who found out the hard way this will crash the instance… :’) Instead, create a new empty system directory so the setting will have somewhere to look to and not throw an error.

mkdir system

If the media still loads properly after this, and new media is fetched and uploaded, it means the setup is working. Yay!

Cleanup, afterwork and thoughts

You can let this setup run a few days to see whether it keeps working, doesn’t overrun your traffic meters etc., before you empty out your local public/system directory, or delete public/system_ if you did the directory switch I detailed above. I can tell you it was quite a weight off to reclaim half my disk space from all that media.

I also ran some accounts refresh jobs because I had missing remote profile pics from emergency media deletions, back when my disk had overflowed and the database crashed. Yeah, things were that bad.

Media loads correctly again on my instance, though there is an initial loading time, and I can get a proper media cache going without my disk at constant risk of running out. Instance management has become enjoyable again without the constant risk of unplanned server downtime, and I am now able to consider putting other services on the server.

In the long term, media storage is something federated software and communities are going to have to figure out. Services like Jortage look interesting, and something like it may be the future of media storage in the fediverse. For now I have found a solution that works for my instance, and if this write-up helps others avoid some of my confusion and mistakes I will be happy–although, let’s be real, these tech posts have mainly been helpful to myself for the purpose of record-keeping and documentation.

(Updated on 12/18/2023: Fixed a line break in the first line of the nginx configuration, added an advisory to clean up media before syncing.)

For instance, there was a gist stating that Mastodon could not directly interface with B2 for object storage because B2 was not S3-compatible, and MinIO would be needed as a relay. This was seemingly confirmed by documentation from Backblaze itself stating its S3 incompatibility. Turns out this was back in 2019-2020 and, as of late 2022, B2 is S3 compatible and MinIO no longer provides the relay function. Guess who only realized this after installing MinIO. ↩︎

A little background & why B2?#

Step 1: Set up a B2 bucket and application key for your instance#

Step 2: Set up a proxy on nginx#

Step 3: Upload existing Mastodon media to your bucket#

Step 4: Mastodon configuration#

Step 5: Check if object storage is working#

Cleanup, afterwork and thoughts#