Thai Natural Language Processing library in Rust, with Python and Node bindings. Formerly oxidized-thainlp.
Vec<String>
bash
echo "ฉันกินข้าว" | nlpo3 segment
```python from nlpo3 import load_dict, segment
loaddict("path/to/dict.file", "dictname") segment("สวัสดีครับ", "dict_name") ```
In Cargo.toml
:
```toml [dependencies]
nlpo3 = "1.3.2" ```
Create a tokenizer using a dictionary from file, then use it to tokenize a string (safe mode = true, and parallel mode = false): ```rust use nlpo3::tokenizer::newmm::NewmmTokenizer; use nlpo3::tokenizer::tokenizer_trait::Tokenizer;
let tokenizer = NewmmTokenizer::new("path/to/dict.file"); let tokens = tokenizer.segment("ห้องสมุดประชาชน", true, false).unwrap(); ```
Create a tokenizer using a dictionary from a vector of Strings:
rust
let words = vec!["ปาลิเมนต์".to_string(), "คอนสติติวชั่น".to_string()];
let tokenizer = NewmmTokenizer::from_word_list(words);
Add words to an existing tokenizer:
rust
tokenizer.add_word(&["มิวเซียม"]);
Remove words from an existing tokenizer:
rust
tokenizer.remove_word(&["กระเพรา", "ชานชลา"]);
Generic test:
bash
cargo test
Build API document and open it to check:
bash
cargo doc --open
Build (remove --release
to keep debug information):
bash
cargo build --release
Check target/
for build artifacts.
Please report issues at https://github.com/PyThaiNLP/nlpo3/issues