dsc is a command line tool for locating duplicate files. The project is heavily inspired by the fd tool.
To get a quick overview of the amount of data that is duplicated, type dsc cmp
in the current folder.
➜ dsc cmp
Duplicate data : 1.54GB
Total duplicates : 2,479
Total duplicate files : 5,164
This process can be sped up by giving a rough estimation of duplicate data by using dsc cmp --estimate
.
➜ dsc cmp --estimate --min-size 500KiB ~/git
Duplicate data : 1.07GB
Total duplicates : 234
Total duplicate files : 309
For a more detailed overview use dsc report
. This will output the duplicate files in CSV (default) or JSON format.
➜ dsc report --min-size 500KiB ~/Downloads
"duplicate","identity","device","file_name","file_size"
0,0,0,"/home/user/Downloads/talon-linux (1)/talon/resources/python/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so",291681296
0,1,0,"/home/user/Downloads/talon-linux/talon/resources/python/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so",291681296
0,2,0,"/home/user/Downloads/talon-linux (2)/talon/resources/python/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so",291681296
1,0,0,"/home/user/Downloads/bloomrpc-1.3.1-x86_64(1).AppImage",189204096
1,1,0,"/home/user/Downloads/bloomrpc-1.3.1-x86_64.AppImage",189204096
To be implemented: Clean up disk space by creating hard links between files on the same devices.
Full help is available by typing dsc help <command>
. All commands can be listed by typing dsc
.