A Rust library to support natural language processing with Python bindings
The rsnltk
library integrates various existing Python-based NLP toolkits for powerful text analysis in Rust-based applications.
This toolkit is based on the Python-based Stanza and other important libraries.
A list of functions from Stanza we bind here include: - Tokenize - Sentence Segmentation - Multi-Word Token Expansion - Part-of-Speech & Morphological Features - Named Entity Recognition - Sentiment Analysis - Language Identification
Additionally, we can calculate the similarity between words based on WordNet though the semantic-kit
PyPI project via pip install semantic-kit
.
Make sure you install Python 3.6.6+ and PIP environment in your computer. Type python -V
in the Terminal should print no error message;
Install our PyPI package ner-kit
(version>=0.0.5a1) for binding the Stanza
package via pip install ner-kit==0.0.5a1
;
Then, Rust environment is also installed in your computer. I use IntelliJ to develop Rust-based applications, where you can write Rust codes;
Create a simple Rust application project with a main()
function.
Add the rsnltk
dependency to the Cargo.toml
file, keep up the Latest version.
After you add the rsnltk
dependency in the toml file
, install necessary language models from Stanza using the following Rust code for the first time you use this package.
rust
fn init_rsnltk_and_test(){
// 1. first install the necessary language models using language codes
let list_lang=vec!["en","zh"];
//e.g. you install two language models,
// namely, for English and Chinese text analysis.
download_langs(list_lang);
// 2. then do test NLP tasks
let text="I like Beijing!";
let lang="en";
// 2. Uncomment the below codes for Chinese NER
// let text="我喜欢北京、上海和纽约!";
// let lang="zh";
let list_ner=ner(text,lang);
for ner in list_ner{
println!("{:?}",ner);
}
}
Or you can manually install those language models via the Python-based ner-kit
package which provide more features in using Stanza. Go to: ner-kit
If no error occur in the above example, then it works. Finally, you can try the following advanced example usage.
Example 1: Part-of-speech Analysis
```rust fn test_pos(){ //let text="我喜欢北京、上海和纽约!"; //let lang="zh"; let text="I like apple"; let lang="en";
let list_result=pos(text,lang);
for word in list_result{
println!("{:?}",word);
}
} ```
Example 2: Sentiment Analysis ```rust fn test_sentiment(){ //let text="I like Beijing!"; //let lang="en"; let text="我喜欢北京"; let lang="zh";
let sentiments=sentiment(text,lang);
for sen in sentiments{
println!("{:?}",sen);
}
}
```
Example 3: Named Entity Recognition
rust
fn test_ner(){
// 1. for English NER
let text="I like Beijing!";
let lang="en";
// 2. Uncomment the below codes for Chinese NER
// let text="我喜欢北京、上海和纽约!";
// let lang="zh";
let list_ner=ner(text,lang);
for ner in list_ner{
println!("{:?}",ner);
}
}
Example 4: Tokenize
```rust fn test_tokenize(){
let text="我喜欢北京、上海和纽约!";
let lang="zh";
let list_result=tokenize(text,lang);
for ner in list_result{
println!("{:?}",ner);
}
}
```
Example 5: Tokenize Sentence
rust
fn test_tokenize_sentence(){
let text="I like apple. Do you like it? No, I am not sure!";
let lang="en";
let list_sentences=tokenize_sentence(text,lang);
for sentence in list_sentences{
println!("Sentence: {}",sentence);
}
}
Example 6: Language Identification
rust
fn test_lang(){
let list_text = vec!["I like Beijing!",
"我喜欢北京!",
"Bonjour le monde!"];
let list_result=lang(list_text);
for lang in list_result{
println!("{:?}",lang);
}
}
Example 7: MWT expand
rust
fn test_mwt_expand(){
let text="Nous avons atteint la fin du sentier.";
let lang="fr";
let list_result=mwt_expand(text,lang);
}
Example 8: Estimate the similarity between words in WordNet
rust
fn test_wordnet_similarity(){
let s1="dog.n.1";
let s2="cat.n.2";
let sims=wordnet_similarity(s1,s2);
for sim in sims{
println!("{:?}",sim);
}
}
Thank Stanford NLP Group for their hard work in Stanza.
MIT