Combining Different Sized Drives with mergerfs and SnapRAID

This post may contain affiliate links. Please read my disclaimer for more info.

When building storage for your home server, there are lots of different routes you can take. You could buy a bunch of drives and build a FreeNAS server. Or maybe you want a hardware solution and RAID looks appealing? Or maybe you’ve heard really cool things about ZFS on Linux and want to try it out.

But what if you have a bunch of different sized drives? This blogpost explores how I use mergerfs and SnapRAID to combine all my drives into a single mount point and create a backup of those drives in case of a failure. It’s a great solution for a home server that can scale as your storage needs scale.

My server is running Ubuntu 18.04, but most of the commands and contents of the article should be pretty OS agnostic. Feel free to reach out to me if you have questions.

mergerfs

The first tool to talk about is mergerfs. The goal of mergerfs is to take your drives and combine them as if they are one large drive. So if you have a 1TB drive and a 4TB drive, and you want to store video files across both of them, you can have a single “Videos” directory that can be 5TB. There are some caveats, but that’s basically how it works.

Mergerfs is considered a “union filesystem” because it basically creates a union of all the underlying filesystems. It’s also a FUSE filesystem, meaning it’s not supported in the Linux kernel, but instead runs in userspace.

So why use something like this over ZFS or RAID? The simplicity of it. Mergerfs doesn’t require you to have drives of the same size, you can bring your own drives and grow your pool over time. I think most of us have picked up drives through the years and combining them into a single pool makes sense.

Is the right solution for everyone? Of course not, ZFS, RAID, and other solutions all have their advantages. But if you are a small home user this can be a very convenient option as it’s inexpensive to get started, especially if you have some drives already laying around.

My Setup

For my home server I’ve got a 4 drives. 2 x 1TB drives and 2 x 2TB drives. I use one of the 2 TB drives as a parity disk for backups using SnapRAID (see the next section). So that leaves us with 2 x 1TB drives and 1 x 2TB drive to create my storage pool of 4TB across three drives.

These three drives are formatted as ext4 and are mounted as /mnt/disk0, /mnt/disk1 and /mnt/disk2. Now, let’s say I wanted to store some pictures from vacation on my server. It would be rather annoying to check out the disk space on all my drives and figure out where I want to store pictures. This is where mergerfs helps us.

On my server, I created a new directory called /srv/storage. This is where we’ll tell mergerfs to “pool” the storage together. So that if I add a new file to /srv/storage/Pictures/2020/Vacation it will automatically create that directory and store files to /mnt/disk0/Pictures/2020/Vacation, if there is space available. If not, mergerfs will automatically use /mnt/disk1 and so on. Here’s a more visual explanation of what’s going on:

That’s what makes mergerfs so powerful, you can have a single directory on your server that people and scripts can write data into. And under the hood, mergerfs automatically puts that file on the drives that make up that mount point.

Installing and fstab

Installing is easy, if you’re using Ubuntu download the debian package from the mergerfs GitHub release page. They also have releases for other operating systems. You can install it by running sudo dpkg -i name_of_package.db.

Next, you need to edit your fstab to mount your data drives (if you haven’t already) and also create your mergerfs pool. Use your favorite text editor to edit /etc/fstab. If you have never touched your fstab file before, reading some background info about the file can be helpful.

This fstab file:

  • Mounts the data drives
  • Mount a parity drive (to be used in next section)
  • Merge the data drives user mergerfs to a single mount point. You can see the full list of options in the mergerfs docs.

After editing your fstab file, save and reboot. If all goes well your server should boot back up and you can now create new files in this shared directory. After creating a new file go and check the individual disks to figure out which disk the file landed on.

SnapRAID

So you now have data sitting across all your different drives. And thanks to mergerfs it shows up in a single combined mount point so you can easily access it without thinking. But what happens if a drive fails? How could you recover the data?

That’s where SnapRAID comes into the picture. It allows you to set a disk aside to use as a “parity” disk. This disk won’t store any of your data, instead, it will hold parity information so that if any disk dies, the data can be recovered. One caveat, the parity drive must match the size of the biggest disk.

So let’s say we had the following disks:

  • 2 TB Drive
  • 1 TB Drive
  • 4 TB Drive
  • 4 TB Drive
  • 2 TB Drive

Then we could select one of the 4 TB drives as the parity disk and still have 9 TB of usable storage (2 TB + 1 TB + 4 TB + 2 TB) across our other remaining disks.

One caveat to keep in mind, SnapRAID does not perform the parity “on write”. Meaning it doesn’t happen automatically any time you write a new file. Instead, you must invoke the snapraid command line tool to recalculate the parity file. Luckily there is an easy way to automate it that I’ve included.

Setting it Up

To set up SnapRAID, I’ll be giving installation instructions to work on Ubuntu 18.04. Check out their online documentation for more information.

First off, add a PPA that holds SnapRAID binaries, update your apt cache and install the package.

Next, you’ll want to edit the SnapRAID configuration file that sits at /etc/snapraid/snapraid.conf. Here is a simple one to get you started:

Reading through the comments in the file should give you some background on the settings. At a high level:

  • We define where to keep the parity file. This will be the only file on the parity mounted drive.
  • We define where to keep the snapraid.content file. I keep one copy on my host operating system (in /var). As well as copies on all my data disks. Note that I don’t put them in the mergerfs pool. I make sure they are explicitly on each drive. You must have 1 copy of the file per parity disk + 1.
  • We define the actual drives to keep parity in check on. This should be the root of the drive mounts.
  • Finally, we exclude a bunch of files from parity calculations. These are mostly temporary files that it doesn’t make sense to back up.

Once that is all set up, try running snapraid sync. If your configuration file and disks are set up correctly, your parity disk should be calculated and created. This initial sync could take a long time depending on the size of your drives.

Automating with SnapRAID Runner

Like I mentioned in the opening about SnapRAID, it doesn’t happen “on write”. Meaning you need to run snapraid sync every time you want to back up your drives. Luckily, there is a great tool called SnapRAID Runner that allows you to automate this.

SnapRAID runner is a python script that runs SnapRAID for you, and will email you if any issues crop up. To install, clone the git repo into a good location.

Then, create a configuration file in a good place (I put mine in /etc/snapraid-runner.conf), and configure it like seen below.

The configuration file does the following:

  • Tells snapraid-runner where SnapRAID is installed and where the configuration file is.
  • The deletethreshold tells snapraid-runner to cancel the sync if more than 40 files got deleted.
  • The email and smtp sections can be set up with an SMTP server to send you an email if the sync fails. I use mailgun for this, their free account should get you pretty far.
  • We also enable “scrub” which checks for parity errors across the disks.

Finally, we can add snapraid-runner to a cron job to automatically run on a set schedule. I have mine set up to run on Sundays at 4:30 AM. Of course you can customize this for what makes sense for you.

Open your crontab by running sudo crontab -e and then add the following line to call snapraid-runner. Adjust the call to where you installed snapraid-runner and the configuration file.

Save the crontab and you should be good to go! SnapRAID will now run on a set schedule to back up your drives.

Conclusions

If you’re like me you probably already have drives you have collected over the years of various sizes and brands and the flexibility of mergerfs and SnapRAID really make it easy for home-labbers to create a data pool of disks you have laying around. You don’t need to go out and buy new disks of all the same size to get a good storage solution.

I’m just scratching the surface here of some of the cool utilities to manage your data on your home server. There are tools like rsnapshot for backing up remote servers and Ansible for managing on these configuration files in a git repository and deployment. And of course things like Samba and NFS for sharing the pool on your network.

If you found this MergerFS + SnapRAID tutorial helpful, check out some of my other articles:

Please consider supporting the blog by joining my mailing list, following the blog on social media or directly through Buy Me a Coffee. All of those really make a difference, thanks for reading!