near-facsimile: Find similar or identical text files in a directory

Rust tests dependency status

Installation

Usage

Options

The following options are available:

Specifying the documentation directory

$ near-facsimile --path <path-to-directory>

Saving the CSV table to a different file

$ near-facsimile --csv <path-to-new-file>

Setting the lowest reported similarity threshold

The tool only reports files that are similar over a certain threshold. By default, the threshold is 85.0, or 85% similar.

$ near-facsimile --threshold=<85.0>

Disregarding certain lines in files

You can configure the file comparison such that it skips all lines that match your regular expressions. The comparison is the calculated from teh remaining lines, which match none of the regular expressions.

For example, skip all lines that start with //:

$ near-facsimile --skip-lines '^//'

Switching to a faster, less accurate comparison

By default, the tool uses the Levenshtein metric, which is accurate but rather slow. You can instead compare files using the Jaro metric, which finishes in around half the time, but produces less accurate statistics.

$ near-facsimile --fast

If you specify the --fast option twice (-ff), the tool uses the relatively rudimentary but very fast trigram comparison instead:

$ near-facsimile --fast --fast