Stringmetrics

This is a Rust library for approximate string matching that implements simple algorithms such has Hamming distance, Levenshtein distance, Jaccard similarity, and more.

Here are some useful quick links:

Algorithms

The main purpose of this library is to provide a variety of string metric functions. Included algorithms are:

See the documentation for full information. Some examples are below:

```rs // Basic levenshtein distance use stringmetrics::levenshtein;

assert_eq!(levenshtein("kitten", "sitting"), 3); ```

```rs // Levenshtein distance with a limit to save computation time use stringmetrics::levenshtein_limit;

asserteq!(levenshteinlimit("a very long string", "short!", 4), 4); ```

```rs // Set custom weights use stringmetrics::{levenshtein_weight, LevWeights};

// This struct holds insertion, deletion, and substitution costs let weights = LevWeights::new(4, 3, 2); asserteq!(levenshteinweight("kitten", "sitting", 100, &weights), 8); ```

```rs // Basic hamming distance use stringmetrics::hamming;

let a = "abcdefg"; let b = "aaadefa"; assert_eq!(hamming(a, b), Ok(3)); ```

Future Algorithms & Direction

Eventually, this library aims to add support for more algorithms. Intended work includes:

  1. Update levenshtein distance to have a more performant algorithm for short (<64 characters) and long (>100 characters) strings
  2. Add the Damerau–Levenshtein distance
  3. Add the Jaro–Winkler distance
  4. Add the Tversky index
  5. Add Cosine similarity
  6. Add some useful tokenizers with examples

License

See the LICENSE file for license information. The provided license does allow for proprietary use and adaptation; that being said, I kindly suggest that if you come up with an improvement, you submit a pull request and help us all out :)