Combining Different Sized Drives with mergerfs and SnapRAID

This post may contain affiliate links. Please read my disclaimer for more info.

When building storage for your home server, there are lots of different routes you can take. You could buy a bunch of drives and build a FreeNAS server. Or maybe you want a hardware solution and RAID looks appealing? Or maybe you’ve heard really cool things about ZFS on Linux and want to try it out.

But what if you have a bunch of different sized drives? This blogpost explores how I use mergerfs and SnapRAID to combine all my drives into a single mount point and create a backup of those drives in case of a failure. It’s a great solution for a home server that can scale as your storage needs scale.

My server is running Ubuntu 18.04, but most of the commands and contents of the article should be pretty OS agnostic. Feel free to reach out to me if you have questions.

mergerfs

The first tool to talk about is mergerfs. The goal of mergerfs is to take your drives and combine them as if they are one large drive. So if you have a 1TB drive and a 4TB drive, and you want to store video files across both of them, you can have a single “Videos” directory that can be 5TB. There are some caveats, but that’s basically how it works.

Mergerfs is considered a “union filesystem” because it basically creates a union of all the underlying filesystems. It’s also a FUSE filesystem, meaning it’s not supported in the Linux kernel, but instead runs in userspace.

So why use something like this over ZFS or RAID? The simplicity of it. Mergerfs doesn’t require you to have drives of the same size, you can bring your own drives and grow your pool over time. I think most of us have picked up drives through the years and combining them into a single pool makes sense.

Is the right solution for everyone? Of course not, ZFS, RAID, and other solutions all have their advantages. But if you are a small home user this can be a very convenient option as it’s inexpensive to get started, especially if you have some drives already laying around.

My Setup

For my home server I’ve got a 4 drives. 2 x 1TB drives and 2 x 2TB drives. I use one of the 2 TB drives as a parity disk for backups using SnapRAID (see the next section). So that leaves us with 2 x 1TB drives and 1 x 2TB drive to create my storage pool of 4TB across three drives.

These three drives are formatted as ext4 and are mounted as /mnt/disk0, /mnt/disk1 and /mnt/disk2. Now, let’s say I wanted to store some pictures from vacation on my server. It would be rather annoying to check out the disk space on all my drives and figure out where I want to store pictures. This is where mergerfs helps us.

On my server, I created a new directory called /srv/storage. This is where we’ll tell mergerfs to “pool” the storage together. So that if I add a new file to /srv/storage/Pictures/2020/Vacation it will automatically create that directory and store files to /mnt/disk0/Pictures/2020/Vacation, if there is space available. If not, mergerfs will automatically use /mnt/disk1 and so on. Here’s a more visual explanation of what’s going on:

A         +      B        =       C
/mnt/disk0       /mnt/disk1       /srv/storage
|                |                |
+-- /dir1        +-- /dir1        +-- /dir1
|   |            |   |            |   |
|   +-- file1    |   +-- file2    |   +-- file1
|                |   +-- file3    |   +-- file2
+-- /dir2        |                |   +-- file3
|   |            +-- /dir3        |
|   +-- file4        |            +-- /dir2
|                     +-- file5   |   |
+-- file6                         |   +-- file4
                                  |
                                  +-- /dir3
                                  |   |
                                  |   +-- file5
                                  |
                                  +-- file6

A + B = C

/mnt/disk0 /mnt/disk1 /srv/storage

| | |

+-- /dir1 +-- /dir1 +-- /dir1

| | | | | |

| +-- file1 | +-- file2 | +-- file1

| | +-- file3 | +-- file2

+-- /dir2 | | +-- file3

| | +-- /dir3 |

| +-- file4 | +-- /dir2

| +-- file5 | |

+-- file6 | +-- file4

+-- /dir3

| |

| +-- file5

+-- file6

That’s what makes mergerfs so powerful, you can have a single directory on your server that people and scripts can write data into. And under the hood, mergerfs automatically puts that file on the drives that make up that mount point.

Installing and fstab

Installing is easy, if you’re using Ubuntu download the debian package from the mergerfs GitHub release page. They also have releases for other operating systems. You can install it by running sudo dpkg -i name_of_package.db.

# Download mergerfs debian package
wget https://github.com/trapexit/mergerfs/releases/download/2.28.3/mergerfs_2.28.3.ubuntu-xenial_amd64.deb
# Install deb package
sudo dpkg -i mergerfs_2.28.3.ubuntu-xenial_amd64.deb

# Download mergerfs debian package

wget https://github.com/trapexit/mergerfs/releases/download/2.28.3/mergerfs_2.28.3.ubuntu-xenial_amd64.deb

# Install deb package

sudo dpkg -i mergerfs_2.28.3.ubuntu-xenial_amd64.deb

Next, you need to edit your fstab to mount your data drives (if you haven’t already) and also create your mergerfs pool. Use your favorite text editor to edit /etc/fstab. If you have never touched your fstab file before, reading some background info about the file can be helpful.

UUID=cb510364-ab26-4b80-b517-9d5a156f7122 /mnt/disk0 ext4 defaults 0 0
UUID=8672b8ec-58b8-4f46-99be-67bfaef36dfc /mnt/disk1 ext4 defaults 0 0
UUID=c4c200e9-d7c4-4112-970e-fc105fa65ed9 /mnt/disk2 ext4 defaults 0 0
UUID=42cbe82f-35e5-45a7-8c4a-a8ba24fd1ff9 /mnt/parity ext4 defaults 0 0

# mergerfs pool
/mnt/disk* /srv/storage fuse.mergerfs direct_io,defaults,allow_other,minfreespace=50G,fsname=mergerfs 0 0

UUID=cb510364-ab26-4b80-b517-9d5a156f7122 /mnt/disk0 ext4 defaults 0 0

UUID=8672b8ec-58b8-4f46-99be-67bfaef36dfc /mnt/disk1 ext4 defaults 0 0

UUID=c4c200e9-d7c4-4112-970e-fc105fa65ed9 /mnt/disk2 ext4 defaults 0 0

UUID=42cbe82f-35e5-45a7-8c4a-a8ba24fd1ff9 /mnt/parity ext4 defaults 0 0

# mergerfs pool

/mnt/disk* /srv/storage fuse.mergerfs direct_io,defaults,allow_other,minfreespace=50G,fsname=mergerfs 0 0

This fstab file:

Mounts the data drives
Mount a parity drive (to be used in next section)
Merge the data drives user mergerfs to a single mount point. You can see the full list of options in the mergerfs docs.

After editing your fstab file, save and reboot. If all goes well your server should boot back up and you can now create new files in this shared directory. After creating a new file go and check the individual disks to figure out which disk the file landed on.

SnapRAID

So you now have data sitting across all your different drives. And thanks to mergerfs it shows up in a single combined mount point so you can easily access it without thinking. But what happens if a drive fails? How could you recover the data?

That’s where SnapRAID comes into the picture. It allows you to set a disk aside to use as a “parity” disk. This disk won’t store any of your data, instead, it will hold parity information so that if any disk dies, the data can be recovered. One caveat, the parity drive must match the size of the biggest disk.

So let’s say we had the following disks:

2 TB Drive
1 TB Drive
4 TB Drive
4 TB Drive
2 TB Drive

Then we could select one of the 4 TB drives as the parity disk and still have 9 TB of usable storage (2 TB + 1 TB + 4 TB + 2 TB) across our other remaining disks.

One caveat to keep in mind, SnapRAID does not perform the parity “on write”. Meaning it doesn’t happen automatically any time you write a new file. Instead, you must invoke the snapraid command line tool to recalculate the parity file. Luckily there is an easy way to automate it that I’ve included.

Setting it Up

To set up SnapRAID, I’ll be giving installation instructions to work on Ubuntu 18.04. Check out their online documentation for more information.

First off, add a PPA that holds SnapRAID binaries, update your apt cache and install the package.

sudo add-apt-repository ppa:tikhonov/snapraid
sudo apt update
sudo apt install snapraid

sudo add-apt-repository ppa:tikhonov/snapraid

sudo apt update

sudo apt install snapraid

Next, you’ll want to edit the SnapRAID configuration file that sits at /etc/snapraid/snapraid.conf. Here is a simple one to get you started:

# Defines the file to use as parity storage
# It must NOT be in a data disk
# Format: "parity FILE_PATH"
parity /mnt/parity/snapraid.parity

# Defines the files to use as content list
# You can use multiple specification to store more copies
# You must have least one copy for each parity file plus one. Some more don't
# hurt
# They can be in the disks used for data, parity or boot,
# but each file must be in a different disk
# Format: "content FILE_PATH"
content /var/snapraid.content
content /mnt/disk0/.snapraid.content
content /mnt/disk1/.snapraid.content
content /mnt/disk2/.snapraid.content

# Defines the data disks to use
# The order is relevant for parity, do not change it
# Format: "disk DISK_NAME DISK_MOUNT_POINT"
disk d0 /mnt/disk0
disk d1 /mnt/disk1
disk d2 /mnt/disk2

# Excludes hidden files and directories (uncomment to enable).
#nohidden

# Defines files and directories to exclude
# Remember that all the paths are relative at the mount points
# Format: "exclude FILE"
# Format: "exclude DIR/"
# Format: "exclude /PATH/FILE"
# Format: "exclude /PATH/DIR/"
exclude *.unrecoverable
exclude /tmp/
exclude /lost+found/
exclude *.!sync
exclude .AppleDouble
exclude ._AppleDouble
exclude .DS_Store
exclude ._.DS_Store
exclude .Thumbs.db
exclude .fseventsd
exclude .Spotlight-V100
exclude .TemporaryItems
exclude .Trashes
exclude .AppleDB

# Defines the file to use as parity storage

# It must NOT be in a data disk

# Format: "parity FILE_PATH"

parity /mnt/parity/snapraid.parity

# Defines the files to use as content list

# You can use multiple specification to store more copies

# You must have least one copy for each parity file plus one. Some more don't

# hurt

# They can be in the disks used for data, parity or boot,

# but each file must be in a different disk

# Format: "content FILE_PATH"

content /var/snapraid.content

content /mnt/disk0/.snapraid.content

content /mnt/disk1/.snapraid.content

content /mnt/disk2/.snapraid.content

# Defines the data disks to use

# The order is relevant for parity, do not change it

# Format: "disk DISK_NAME DISK_MOUNT_POINT"

disk d0 /mnt/disk0

disk d1 /mnt/disk1

disk d2 /mnt/disk2

# Excludes hidden files and directories (uncomment to enable).

#nohidden

# Defines files and directories to exclude

# Remember that all the paths are relative at the mount points

# Format: "exclude FILE"

# Format: "exclude DIR/"

# Format: "exclude /PATH/FILE"

# Format: "exclude /PATH/DIR/"

exclude *.unrecoverable

exclude /tmp/

exclude /lost+found/

exclude *.!sync

exclude .AppleDouble

exclude ._AppleDouble

exclude .DS_Store

exclude ._.DS_Store

exclude .Thumbs.db

exclude .fseventsd

exclude .Spotlight-V100

exclude .TemporaryItems

exclude .Trashes

exclude .AppleDB

Reading through the comments in the file should give you some background on the settings. At a high level:

We define where to keep the parity file. This will be the only file on the parity mounted drive.
We define where to keep the snapraid.content file. I keep one copy on my host operating system (in /var). As well as copies on all my data disks. Note that I don’t put them in the mergerfs pool. I make sure they are explicitly on each drive. You must have 1 copy of the file per parity disk + 1.
We define the actual drives to keep parity in check on. This should be the root of the drive mounts.
Finally, we exclude a bunch of files from parity calculations. These are mostly temporary files that it doesn’t make sense to back up.

Once that is all set up, try running snapraid sync. If your configuration file and disks are set up correctly, your parity disk should be calculated and created. This initial sync could take a long time depending on the size of your drives.

Automating with SnapRAID Runner

Like I mentioned in the opening about SnapRAID, it doesn’t happen “on write”. Meaning you need to run snapraid sync every time you want to back up your drives. Luckily, there is a great tool called SnapRAID Runner that allows you to automate this.

SnapRAID runner is a python script that runs SnapRAID for you, and will email you if any issues crop up. To install, clone the git repo into a good location.

sudo git clone https://github.com/Chronial/snapraid-runner.git /opt/snapraid-runner

1	sudo git clone https://github.com/Chronial/snapraid-runner.git /opt/snapraid-runner

Then, create a configuration file in a good place (I put mine in /etc/snapraid-runner.conf), and configure it like seen below.

[snapraid]
; path to the snapraid executable (e.g. /bin/snapraid)
executable = /usr/bin/snapraid
; path to the snapraid config to be used
config = /etc/snapraid.conf
; abort operation if there are more deletes than this, set to -1 to disable
deletethreshold = 40
; if you want touch to be ran each time
touch = false

[logging]
; logfile to write to, leave empty to disable
file = /var/log/snapraid.log
; maximum logfile size in KiB, leave empty for infinite
maxsize = 5000

[email]
; when to send an email, comma-separated list of [success, error]
sendon = success,error
; set to false to get full programm output via email
short = true
subject = [SnapRAID] Status Report:
from = postmaster@demo1234.mailgun.org
to = your_email@example.com
; maximum email size in KiB
maxsize = 500

[smtp]
host = smtp.mailgun.org
; leave empty for default port
port = 465
; set to "true" to activate
ssl = false
tls = false
user = postmaster@demo1234.mailgun.org
password = mailgunpassword

[scrub]
; set to true to run scrub after sync
enabled = true
percentage = 22
older-than = 12

[snapraid]

; path to the snapraid executable (e.g. /bin/snapraid)

executable = /usr/bin/snapraid

; path to the snapraid config to be used

config = /etc/snapraid.conf

; abort operation if there are more deletes than this, set to -1 to disable

deletethreshold = 40

; if you want touch to be ran each time

touch = false

[logging]

; logfile to write to, leave empty to disable

file = /var/log/snapraid.log

; maximum logfile size in KiB, leave empty for infinite

maxsize = 5000

[email]

; when to send an email, comma-separated list of [success, error]

sendon = success,error

; set to false to get full programm output via email

short = true

subject = [SnapRAID] Status Report:

from = postmaster@demo1234.mailgun.org

to = your_email@example.com

; maximum email size in KiB

maxsize = 500

[smtp]

host = smtp.mailgun.org

; leave empty for default port

port = 465

; set to "true" to activate

ssl = false

tls = false

user = postmaster@demo1234.mailgun.org

password = mailgunpassword

[scrub]

; set to true to run scrub after sync

enabled = true

percentage = 22

older-than = 12

The configuration file does the following:

Tells snapraid-runner where SnapRAID is installed and where the configuration file is.
The deletethreshold tells snapraid-runner to cancel the sync if more than 40 files got deleted.
The email and smtp sections can be set up with an SMTP server to send you an email if the sync fails. I use mailgun for this, their free account should get you pretty far.
We also enable “scrub” which checks for parity errors across the disks.

Finally, we can add snapraid-runner to a cron job to automatically run on a set schedule. I have mine set up to run on Sundays at 4:30 AM. Of course you can customize this for what makes sense for you.

Open your crontab by running sudo crontab -e and then add the following line to call snapraid-runner. Adjust the call to where you installed snapraid-runner and the configuration file.

30 4 * * 0 python /opt/snapraid-runner/snapraid-runner.py --conf /etc/snapraid-runner.conf

1	30 4 * * 0 python /opt/snapraid-runner/snapraid-runner.py --conf /etc/snapraid-runner.conf

Save the crontab and you should be good to go! SnapRAID will now run on a set schedule to back up your drives.

Conclusions

If you’re like me you probably already have drives you have collected over the years of various sizes and brands and the flexibility of mergerfs and SnapRAID really make it easy for home-labbers to create a data pool of disks you have laying around. You don’t need to go out and buy new disks of all the same size to get a good storage solution.

I’m just scratching the surface here of some of the cool utilities to manage your data on your home server. There are tools like rsnapshot for backing up remote servers and Ansible for managing on these configuration files in a git repository and deployment. And of course things like Samba and NFS for sharing the pool on your network.

If you found this MergerFS + SnapRAID tutorial helpful, check out some of my other articles:

Please consider supporting the blog by joining my mailing list, following the blog on social media or directly through Buy Me a Coffee. All of those really make a difference, thanks for reading!