dsc

dsc is a command line tool for locating duplicate files. The project is heavily inspired by the fd tool.

Usage

Summary

To get a quick overview of the amount of data that is duplicated, type dsc cmp in the current folder. ➜ dsc cmp Duplicate data : 1.54GB Total duplicates : 2,479 Total duplicate files : 5,164

This process can be sped up by giving a rough estimation of duplicate data by using dsc cmp --estimate. ➜ dsc cmp --estimate --min-size 500KiB ~/git Duplicate data : 1.07GB Total duplicates : 234 Total duplicate files : 309

Report

For a more detailed overview use dsc report. This will output the duplicate files in CSV (default) or JSON format.

➜ dsc report --min-size 500KiB ~/Downloads "duplicate","identity","device","file_name","file_size" 0,0,0,"/home/user/Downloads/talon-linux (1)/talon/resources/python/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so",291681296 0,1,0,"/home/user/Downloads/talon-linux/talon/resources/python/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so",291681296 0,2,0,"/home/user/Downloads/talon-linux (2)/talon/resources/python/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so",291681296 1,0,0,"/home/user/Downloads/bloomrpc-1.3.1-x86_64(1).AppImage",189204096 1,1,0,"/home/user/Downloads/bloomrpc-1.3.1-x86_64.AppImage",189204096

Link

Use dsc link to clean up disk space by creating hard links between files on the same devices.

➜ dsc link --dry-run ~/Downloads Are you sure you want to link 5,164 files? [y/N]: y Done. Reclaimed 1.54GB of disk space.

To see what is going to happen before running link you can run dsc link --dry-run

➜ dsc link --dry-run ~/Downloads (dryrun) Are you sure you want to link 5,164 files? [y/N]: y linking "/home/user/Downloads/bloomrpc-1.3.1-x86_64(1).AppImage" => "/home/user/Downloads/bloomrpc-1.3.1-x86_64.AppImage" ...

Other

Full help is available by typing dsc help <command>. All commands can be listed by typing dsc.