Find, Sort, Filter & Delete duplicate files
```bash Usage: deduplicator [OPTIONS] [scandirpath]
Arguments: [scandirpath] Run Deduplicator on dir different from pwd (e.g., ~/Pictures )
Options:
-t, --types
```bash
deduplicator -t pdf,jpg,png -i
deduplicator ~/Pictures/ -t png,jpeg,jpg,pdf -i
deduplicator ~/Pictures --max-depth 0
deduplicator ~/.config --follow-links
deduplicator ~/Media --min-size 100mb ```
bash
$ cargo install deduplicator
if you'd like to install with nightly features, you can use
bash
$ cargo install --git https://github.com/sreedevk/deduplicator
Please note that if you use a version manager to install rust (like asdf), you need to reshim (asdf reshim rust
).
you can download the pre-built binary from the Releases page.
download the deduplicator-x86_64-unknown-linux-gnu.tar.gz
for linux. Once you have the tarball file with the executable,
you can follow these steps to install:
bash
$ tar -zxvf deduplicator-x86_64-unknown-linux-gnu.tar.gz
$ sudo mv deduplicator /usr/bin/
you can download the pre-build binary from the Releases page.
download the deduplicator-x86_64-apple-darwin.tar.gz
tarball for mac os. Once you have the tarball file with the executable, you can follow these steps to install:
bash
$ tar -zxvf deduplicator-x86_64-unknown-linux-gnu.tar.gz
$ sudo mv deduplicator /usr/bin/
you can download the pre-build binary from the Releases page.
download the deduplicator-x86_64-pc-windows-msvc.zip
zip file for windows. unzip the zip
file & move the deduplicator.exe
to a location in the PATH system environment variable.
Note: If you Run into an msvc error, please install MSCV from here
Deduplicator uses size comparison and fxhash (a non non-cryptographic hashing algo) to quickly scan through large number of files to find duplicates. its also highly parallel (uses rayon and dashmap). I was able to scan through 120GB of files (Videos, PDFs, Images) in ~300ms. checkout the benchmarks
| Command | Dirsize | Filecount | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|:---|---:|---:|---:|---:|---:|
| deduplicator ~/Data/tmp
| (~120G) | 721 files | 33.5 ± 28.6 | 25.3 | 151.5 | 1.87 ± 1.60 |
| deduplicator ~/Data/books
| (~8.6G) | 1419 files | 24.5 ± 1.0 | 22.9 | 28.1 | 1.37 ± 0.08 |
| deduplicator ~/Data/books --min-size 10M
| (~8.6G) | 1419 files | 17.9 ± 0.7 | 16.8 | 20.0 | 1.00 |
| deduplicator ~/Data/ --types pdf,jpg,png,jpeg
| (~290G) | 104222 files | 1207.2 ± 37.0 | 1172.2 | 1287.7 | 67.27 ± 3.33 |
These benchmarks were run using hyperfine. Here are the specs of the machine used to benchmark deduplicator:
OS: Arch Linux x86_64
Host: Precision 5540
Kernel: 5.15.89-1-lts
Uptime: 4 hours, 44 mins
Shell: zsh 5.9
Terminal: kitty
CPU: Intel i9-9880H (16) @ 4.800GHz
GPU: NVIDIA Quadro T2000 Mobile / Max-Q
GPU: Intel CoffeeLake-H GT2 [UHD Graphics 630]
Memory: 31731MiB (~32GiB)