nlpO3

Thai Natural Language Processing library in Rust, with Python and Node bindings. Formerly oxidized-thainlp.

Features

Dictionary file

Usage

Command-line interface

bash echo "ฉันกินข้าว" | nlpo3 segment

Bindings

```python from nlpo3 import load_dict, segment

loaddict("path/to/dict.file", "dictname") segment("สวัสดีครับ", "dict_name") ```

As Rust library

crates.io

In Cargo.toml:

```toml [dependencies]

...

nlpo3 = "1.3.2" ```

Create a tokenizer using a dictionary from file, then use it to tokenize a string (safe mode = true, and parallel mode = false): ```rust use nlpo3::tokenizer::newmm::NewmmTokenizer; use nlpo3::tokenizer::tokenizer_trait::Tokenizer;

let tokenizer = NewmmTokenizer::new("path/to/dict.file"); let tokens = tokenizer.segment("ห้องสมุดประชาชน", true, false).unwrap(); ```

Create a tokenizer using a dictionary from a vector of Strings: rust let words = vec!["ปาลิเมนต์".to_string(), "คอนสติติวชั่น".to_string()]; let tokenizer = NewmmTokenizer::from_word_list(words);

Add words to an existing tokenizer: rust tokenizer.add_word(&["มิวเซียม"]);

Remove words from an existing tokenizer: rust tokenizer.remove_word(&["กระเพรา", "ชานชลา"]);

Build

Requirements

Steps

Generic test: bash cargo test

Build API document and open it to check: bash cargo doc --open

Build (remove --release to keep debug information): bash cargo build --release

Check target/ for build artifacts.

Development documents

Issues

Please report issues at https://github.com/PyThaiNLP/nlpo3/issues