ddup
(Detect Duplicates) is an extremely fast tool that identifies potentially duplicated files in
Windows NTFS Volumes.
ddup C:
ddup C: -m **\*.dmp -i
Output:
Scanning drive C: with matcher `**/*.dmp` (case-insensitive) [Fuzzy comparison]
Generating recursive dirlist
Grouping by file size
Grouping by hash
Potential duplicates [17468 bytes]
1 C:\ProgramData\Microsoft\Windows\Containers\Dumps\29292c13-143c-4070-98b5-7e12e2afddfc.dmp
2 C:\Windows\LiveKernelReports\NDIS-20180504-0002.dmp
Finished in 7.5456786 seconds
Install from crates.io (Not yet available)
shell script
cargo install ddup
Install from repository:
shell script
cargo install --path .
This tool is written in Rust .
ddup
obtains a recursive dirlist by leveraging the NTFS USN Journal mechanism
in order to obtain USN records for MFT (Master File Table) entries.
The Windows API is available via the following IOCTL
s:
* FSCTL_ENUM_USN_DATA
* FSCTL_QUERY_USN_JOURNAL
The USN records represent either Files or Directories, linking one to another, so in order to resolve the full path
of a file, an SQL-equivalent "recursive join" has to be performed on the records (implemented via a HashMap
).
After the full paths are resolved, we start comparing the files by using several iterations: * Find groups of files that have the same size * Compare files using fuzzy hashing
The results are most probably identical, although it is not strictly guaranteed.
To guarantee total equivalence, use the --strict
flag (however this impacts performance greatly)