A crate to evaluate distances between strings (and others).
Heavily inspired by the julia StringDistances
Q-gram distances compare the set of all slices of length q
in each str, where q > 0
Qgram::new(usize)
Cosine::new(usize)
Jaccard::new(usize)
SorensenDice::new(usize)
Overlap::new(usize)
The crate includes distance "modifiers", that can be applied to any distance.
str_distance::str_distance*
convenience functions.str_distance
and str_distance_normalized
take the two string inputs for which the distance is determined using the passed 'DistanceMetric.
strdistancenormalized` evaluates the normalized distance between two strings. A value of '0.0' corresponds to the "zero distance", both strings are considered equal by means of the metric, whereas a value of '1.0' corresponds to the maximum distance that can exist between the strings.
Calling the str_distance::str_distance*
is just convenience for DistanceMetric.str_distance*("", "")
Levenshtein metrics offer the possibility to define a maximum distance at which the further calculation of the exact distance is aborted early.
Distance
```rust use str_distance::*;
// calculate the exact distance asserteq!(strdistance("kitten", "sitting", Levenshtein::default()), DistanceValue::Exact(3));
// short circuit if distance exceeds 10 let s1 = "Wisdom is easily acquired when hiding under the bed with a saucepan on your head."; let s2 = "The quick brown fox jumped over the angry dog."; asserteq!(strdistance(s1, s2, Levenshtein::withmaxdistance(10)), DistanceValue::Exceeded(10)); ```
Normalized Distance
rust
use str_distance::*;
assert_eq!(str_distance_normalized("" , "", Levenshtein::default()), 0.0);
assert_eq!(str_distance_normalized("nacht", "nacht", Levenshtein::default()), 0.0);
assert_eq!(str_distance_normalized("abc", "def", Levenshtein::default()), 1.0);
DistanceMetric
trait``rust
use str_distance::{DistanceMetric, SorensenDice};
// QGram metrics require the length of the underlying fragment length to use for comparison.
// For
SorensenDice` default is 2.
asserteq!(SorensenDice::new(2).strdistance("nacht", "night"), 0.75);
```
DistanceMetric
was designed for str
types, but is not limited to. Calculating distance is possible for all data types which are comparable and are passed as 'IntoIterator', e.g. as Vec
```rust use str_distance::{DistanceMetric, Levenshtein, DistanceValue};
assert_eq!(*Levenshtein::default().distance(&[1,2,3], &[1,2,3,4,5,6]),3); ```
Full docs available at docs.rs
Licensed under either of these: