rs-natural

Build Status

Natural language processing library written in Rust. Still very much a work in progress. Basically an experiment, but hey maybe something cool will come out of it.

Currently working:

Near-sight goals:

How to use

Use at your own risk. Some functionality is missing, some other functionality is slow as molasses because it isn't optomized yet. I'm targeting master, and don't offer backward compatibility.

Setup

It's a crate with a cargo.toml. Add this to your cargo.toml:

[dependencies.natural] git = "https://github.com/cjqed/rs-natural"

Distance

```rust extern crate natural; use natural::distance::jarowinklerdistance; use natural::distance::levenshtein_distance;

asserteq!(levenshteindistance("kitten", "sitting"), 3); asserteq!(jarowinkler_distance("dixon", "dicksonx"), 0.767);

```

Note, don't actually assert_eq! on JWD since it returns an f64. To test, I actually use:

```rust fn f64_eq(a: f32, b: f32) { assert!((a - b).abs() < 0.01); }

```

Phonetics

There are two ways to gain access to the SoundEx algorithm in this library, either through a simple soundex function that accepts two &str parameters and returns a boolean, or through the SoundexWord struct. I will show both here.

```rust use natural::phonetics::soundex; use natural::phonetics::SoundexWord;

assert!(soundex("rupert", "robert"));

let s1 = SoundexWord::new("rupert"); let s2 = SoundexWord::new("robert"); assert!(s1.soundslike(s2)); assert!(s1.soundslike_str("robert"));

```

Tokenization

```rust extern crate natural; use natural::tokenize::tokenize;

asserteq!(tokenize("hello, world!"), vec!["hello", "world"]); asserteq!(tokenize("My dog has fleas."), vec!["My", "dog", "has", "fleas"]);

```

NGrams

You can create an ngram with and without padding, e.g.:

```rust extern crate natural;

use natural::ngram::getngram; use natural::ngram::getngramwithpadding;

asserteq!(getngram("hello my darling", 2), vec![vec!["hello", "my"], vec!["my", "darling"]]);

asserteq!(getngramwithpadding("my fleas", 2, "----"), vec![ vec!["----", "my"], vec!["my", "fleas"], vec!["fleas", "----"]]); ```

Classification

```rust extern crate natural; use natural::classifier::NaiveBayesClassifier;

let mut nbc = NaiveBayesClassifier::new();

nbc.train(STRINGTOTRAIN, LABEL); nbc.train(STRINGTOTRAIN, LABEL); nbc.train(STRINGTOTRAIN, LABEL); nbc.train(STRINGTOTRAIN, LABEL);

nbc.guess(STRINGTOGUESS); //returns a label with the highest probability ```

Tf-Idf

```rust extern crate natural; use natural::tf_idf::TfIdf;

tfidf.add("this document is about rust."); tfidf.add("this document is about erlang."); tfidf.add("this document is about erlang and rust."); tfidf.add("this document is about rust. it has rust examples");

println!(tfidf.get("rust")); //0.2993708f32 println!(tfidf.get("erlang")); //0.13782766f32

//average of multiple terms println!(tf_idf.get("rust erlang"); //0.21859923 ```