A convenient splitting and concatenating filesystem.
While setting up a cloud based backup and archive solution, I encountered the following phenomenon: Many small files would get uploaded quite fast and – depending on the actual cloud storage provider – highly concurrently, while big files tend to slow down the whole process. The explanation is simple, many cloud storage providers do not support concurrent or chunked uploads of a single file, sometimes they would not even support resuming a partial upload. You would need to upload it in one go, sequentially one byte at a time, it's all or nothing.
Now consider a scenario, where you upload a huge file, like a mirror of your Raspberry Pi's SD card with the system and configuration on it. I have such a file, it is about 4 GB big. Now, while backing up my system, this was the last file to be uploaded. According to ETA calculations, it would have taken several hours, so I let it run overnight. The next morning I found out that after around 95% of upload process, my internet connection vanished for just a few seconds, but long enough for the transfer tool to abort the upload. The temporary file got deleted from the cloud storage, so I had to start from zero again. Several hours of uploading wasted.
I thought of a way to split big files, so that I can upload it more efficiently, but I came to the conclusion, that manually splitting files, uploading them, and deleting them afterwards locally, is not a very scalable solution.
So I came up with the idea of a special filesystem. A filesystem that would present big files as if they were many small chunks in separate files. In reality, the chunks would all point to the same physical file, only with different offsets. This way I could upload chunked files in parallel without losing too much progress, even if the upload gets aborted midway.
SplitFS was born.
If I download such chunked file parts, I would need to call cat * >file
afterwards to re-create the actual file. This seems like a similar hassle like
manually splitting files. That's why I had also CatFS in mind, when
developing SCFS. CatFS will concatenate chunked files transparently and
present them as complete files again.
CatFS is included in SCFS since version 0.4.0.
I am relatively new to Rust and I thought, the best way to deepen my understanding with Rust is to take on a project that would require dedication and a certain knowledge of the language.
SCFS can be installed easily through Cargo via crates.io
:
text
cargo install scfs
To mount a directory with SplitFS, use the following form:
text
scfs --mode=split <base directory> <mount point>
Since version 0.8.0 this can be simplified by using the dedicated splitfs
binary:
text
splitfs <base directory> <mount point>
The directory specified as mount point
will now reflect the content of base
directory
, replacing each regular file with a directory that contains
enumerated chunks of that file as separate files.
Since version 0.7.0, it is possible to use a custom block size for the file fragments. For example, to use 1 MB chunks instead of the default size of 2 MB, you would go with:
text
splitfs --blocksize=1048576 <base directory> <mount point>
Where 1048576 is 1024 * 1024, so one megabyte in bytes.
You can even leverage the calculating power of your Shell, like for example in Bash:
text
splitfs --blocksize=$((1024 * 1024)) <base directory> <mount point>
You can actually go as far as to set a block size of one byte, but be prepared for a ridiculous amount of overhead or maybe even a system freeze because the metadata table grows too large.
To mount a directory with CatFS, use the following form:
text
scfs --mode=cat <base directory> <mount point>
Since version 0.8.0 this can be simplified by using the dedicated catfs
binary:
text
catfs <base directory> <mount point>
Please note that base directory
needs to be a directory structure that has
been generated by SplitFS. CatFS will refuse mounting the directory otherwise.
The directory specified as mount point
will now reflect the content of base
directory
, replacing each directory with chunked files in it as single files.
Since v0.8.0 it is possible to pass additional mount options to the underlying FUSE library.
SCFS supports two ways of specifying options, either via the "-o" option, or via additional arguments after a "--" separator. This is in accordance to other FUSE based filesystems like EncFS.
These two calls are equivalent:
text
scfs --mode=split -o nonempty mirror mountpoint
scfs --mode=split mirror mountpoint -- nonempty
Of course, these methods also work in the splitfs
and catfs
binaries.
I consider this project no longer a "raw prototype" and I am dog-fooding it, meaning I use it in my own backup strategies and create features based on my personal needs.
However, this might not meet the needs of the typical user and without feedback I might not even think of some scenarios to begin with.
Specifically, these are the current limitations of SCFS:
It should work an all UNIX based systems, like Linux and maybe some MacOS versions, however without MacOS specific file attributes. But definitely not on Windows, since this would need special handling of system calls, which I haven't had time to care of yet.
It can only work with directories, regular files, and symlinks (since v0.8.0). Every other file types (device files, pipes, and so on) will be silently ignored. Since v0.8.0: Not supported file types will no longer result in a panic.
The base directory will be mounted read-only in the new mount point, and SCFS expects that the base directory will not be altered while mounted.