Stop words are words that don't carry much meaning, and are typically removed as a preprocessing step before text analysis or natural language processing. This crate contains common stop words for a variety of languages. All stop word lists are from this resource.
This crate currently includes the following languages: - Arabic - Bulgarian - Catalan - Czech - Danish - Dutch - English - Finnish - French - German - Hebrew - Hindi - Hungarian - Indonesian - Italian - Norwegian - Polish - Portuguese - Romanian - Russian - Slovak - Spanish - Swedish - Turkish - Ukrainian - Vietnamese
Install through crates.io
with:
cargo install stop_words
Then add it to your `Cargo.toml
with:
[dependencies]
stop-words = "0.1.2"
Using this crate is fairly straight-forward: ``` use stop_words;
fn main() { // Get the stop words let words = stop_words::get("english");
// Print them
for word in words {
println!("{}", word)
}
}
The function ``get`` accepts full language names (in English), ISO 693-1 language codes (2-letter codes), and ISO 693-2T (3-letter codes) language codes. This means you can also do this:
let words = stopwords::get("en");
or this:
let words = stopwords::get("eng");
Finally, you can also convert the ``Vec<String>``of words to a ``HashSet<String>``
let vec = stopwords::get("en");
let set = stopwords::vectoset(vec);
```