multi-machine-dedup

About

multi-machine-dedup is a deduplication tool using SQLite to allow multi-machine features.

multi-machine-dedup is an EDLA project.

The purpose of edla.org is to promote the state of the art in various domains.

Installation

cargo install multi-machine-dedup

How to use it

Index recursively a directory labelled with a \

Check a directory multi-machine-dedup check-integrity -l <LABEL> --db <SQLITE_FILE>

Compare two databases multi-machine-dedup compare --db1 <SQLITE_FILE_1> --db2 <SQLITE_FILE_2>

Example of SQL queries

You can use a convenient database tool like DBeaver CE or SQLiteStudio to query the generated SQLite database.

Find top duplicates files larger than select label, full_path, hash,size,nb_dup from file , (select hash, count(*) as nb_dup from file where size > <A_SIZE> group by hash order by nb_dup DESC, size DESC) as T where file.hash = T.hash order by nb_dup DESC, size DESC ;

Find all files with the same select * from file where hash=<A_CRC_VALUE> ;

Find all files with image/jpeg MIME-type. select * from hash where mime like "image/jpeg" ;

Tips

Roadmap

Inspired by https://github.com/hgrecco/dedup multi-machine-dedup will probably propose similar features.

License

© 2022 Olivier ROLAND. Distributed under the GPLv3 License.