BTRFS Dedupe

This is a BTRFS deduplication utility. It operates in a batch mode, scanning for files with the same size, performing an SHA256 hash on each one, then invoking the kernel deduplication ioctl for all those that match.

It is written by James Pharaoh.

It is hosted at [gitlab.wellbehavedsoftware.com] (https://gitlab.wellbehavedsoftware.com/well-behaved-software/wbs-backup/tree/master/btrfs-dedupe) — please report any issues or feature requests here.

It is also available from the following locations:

General information

The utility is very simple. It takes a list of directories, scans for files with matching sizes, performs an SHA256 checksum on each one, then invokes the ioctl to deduplicate the entire file for every match it finds. Optionally, it can match filenames as well as sizes; this may make the program run faster in some cases.

Usage

From the built-in help:

``` $ btrfs-dedupe --help

Btrfs Dedupe

USAGE: btrfs-dedupe [FLAGS] []

FLAGS: -h, --help Prints help information --match-filename Match filename as well as checksum -V, --version Prints version information

ARGS: ... Root path to scan for files ```

Alternatives

There are two alternatives, of which I am aware:

There is also [ongoing work] (http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg32862.html) to enable automatic realtime deduplication in the filesystem itself, but this is likely to take a long time to stablise, and there are fundamental issues with the concept which make it unsuitable for many cases.

There is a wiki page with general information about the state of deduplication in BTRFS.