str-distance

Build Status Crates.io Documentation

A crate to evaluate distances between strings (and others).

Heavily inspired by the julia StringDistances

Distance Metrics

Usage

The str_distance::str_distance* convenience functions.

str_distance and str_distance_normalized take the two string inputs for which the distance is determined using the passed 'DistanceMetric. strdistancenormalized` evaluates the normalized distance between two strings. A value of '0.0' corresponds to the "zero distance", both strings are considered equal by means of the metric, whereas a value of '1.0' corresponds to the maximum distance that can exist between the strings.

Calling the str_distance::str_distance* is just convenience for DistanceMetric.str_distance*("", "")

Example

Levenshtein metrics offer the possibility to define a maximum distance at which the further calculation of the exact distance is aborted early.

Distance

```rust use str_distance::*;

// calculate the exact distance asserteq!(strdistance("kitten", "sitting", Levenshtein::default()), DistanceValue::Exact(3));

// short circuit if distance exceeds 10 let s1 = "Wisdom is easily acquired when hiding under the bed with a saucepan on your head."; let s2 = "The quick brown fox jumped over the angry dog."; asserteq!(strdistance(s1, s2, Levenshtein::withmaxdistance(10)), DistanceValue::Exceeded(10)); ```

Normalized Distance

rust use str_distance::*; assert_eq!(str_distance_normalized("" , "", Levenshtein::default()), 0.0); assert_eq!(str_distance_normalized("nacht", "nacht", Levenshtein::default()), 0.0); assert_eq!(str_distance_normalized("abc", "def", Levenshtein::default()), 1.0);

The DistanceMetric trait

``rust use str_distance::{DistanceMetric, SorensenDice}; // QGram metrics require the length of the underlying fragment length to use for comparison. // ForSorensenDice` default is 2. asserteq!(SorensenDice::new(2).strdistance("nacht", "night"), 0.75);

```

DistanceMetric was designed for str types, but is not limited to. Calculating distance is possible for all data types which are comparable and are passed as 'IntoIterator', e.g. as Vec

```rust use str_distance::{DistanceMetric, Levenshtein, DistanceValue};

assert_eq!(*Levenshtein::default().distance(&[1,2,3], &[1,2,3,4,5,6]),3); ```

Documentation

Full docs available at docs.rs

References

License

Licensed under either of these: