Natural language processing library written in Rust. Still very much a work in progress. Basically an experiment, but hey maybe something cool will come out of it.
Currently working:
Near-sight goals:
Use at your own risk. Some functionality is missing, some other functionality is slow as molasses because it isn't optomized yet. I'm targeting master, and don't offer backward compatibility.
It's a crate with a cargo.toml. Add this to your cargo.toml:
``` [dependencies] natural = "0.3.0"
natural = { version = "0.4.0", features = ["serde_support"]} serde = "1.0" ```
```rust extern crate natural; use natural::distance::jarowinklerdistance; use natural::distance::levenshtein_distance;
asserteq!(levenshteindistance("kitten", "sitting"), 3); asserteq!(jarowinkler_distance("dixon", "dicksonx"), 0.767);
```
Note, don't actually assert_eq!
on JWD since it returns an f64. To test, I actually use:
```rust fn f64_eq(a: f32, b: f32) { assert!((a - b).abs() < 0.01); }
```
There are two ways to gain access to the SoundEx algorithm in this library, either through a simple soundex
function that accepts two &str
parameters and returns a boolean, or through the SoundexWord struct. I will show both here.
```rust use natural::phonetics::soundex; use natural::phonetics::SoundexWord;
assert!(soundex("rupert", "robert"));
let s1 = SoundexWord::new("rupert"); let s2 = SoundexWord::new("robert"); assert!(s1.soundslike(s2)); assert!(s1.soundslike_str("robert"));
```
```rust extern crate natural; use natural::tokenize::tokenize;
asserteq!(tokenize("hello, world!"), vec!["hello", "world"]); asserteq!(tokenize("My dog has fleas."), vec!["My", "dog", "has", "fleas"]);
```
You can create an ngram with and without padding, e.g.:
```rust extern crate natural;
use natural::ngram::getngram; use natural::ngram::getngramwithpadding;
asserteq!(getngram("hello my darling", 2), vec![vec!["hello", "my"], vec!["my", "darling"]]);
asserteq!(getngramwithpadding("my fleas", 2, "----"), vec![ vec!["----", "my"], vec!["my", "fleas"], vec!["fleas", "----"]]); ```
```rust extern crate natural; use natural::classifier::NaiveBayesClassifier;
let mut nbc = NaiveBayesClassifier::new();
nbc.train(STRINGTOTRAIN, LABEL); nbc.train(STRINGTOTRAIN, LABEL); nbc.train(STRINGTOTRAIN, LABEL); nbc.train(STRINGTOTRAIN, LABEL);
nbc.guess(STRINGTOGUESS); //returns a label with the highest probability ```
```rust extern crate natural; use natural::tf_idf::TfIdf;
tfidf.add("this document is about rust."); tfidf.add("this document is about erlang."); tfidf.add("this document is about erlang and rust."); tfidf.add("this document is about rust. it has rust examples");
println!(tfidf.get("rust")); //0.2993708f32 println!(tfidf.get("erlang")); //0.13782766f32
//average of multiple terms println!(tf_idf.get("rust erlang"); //0.21859923 ```