A morphological analysis library in Rust. This project fork from kuromoji-rs.
Lindera aims to build a library which is easy to install and provides concise APIs for various Rust applications.
The following products are required to build:
Make sure you activated the full features of the lindera crate on Cargo.toml:
[dependencies]
lindera = { version = "0.12.0", features = ["full"] }
This example covers the basic usage of Lindera.
It will: - Create a tokenizer in normal mode - Tokenize the input text - Output the tokens
```rust use lindera::tokenizer::Tokenizer; use lindera::LinderaResult;
fn main() -> LinderaResult<()> { // create tokenizer let tokenizer = Tokenizer::new()?;
// tokenize the text
let tokens = tokenizer.tokenize("関西国際空港限定トートバッグ")?;
// output the tokens
for token in tokens {
println!("{}", token.text);
}
Ok(())
} ```
The above example can be run as follows:
shell script
% cargo run --features=ipadic --example=basic_example
You can see the result as follows:
text
関西国際空港
限定
トートバッグ
You can give user dictionary entries along with the default system dictionary. User dictionary should be a CSV with following format.
<surface_form>,<part_of_speech>,<reading>
For example:
shell
% cat userdic.csv
東京スカイツリー,カスタム名詞,トウキョウスカイツリー
東武スカイツリーライン,カスタム名詞,トウブスカイツリーライン
とうきょうスカイツリー駅,カスタム名詞,トウキョウスカイツリーエキ
With an user dictionary, Tokenizer
will be created as follows:
```rust
use std::path::PathBuf;
use lindera::tokenizer::{Tokenizer, TokenizerConfig}; use linderacore::viterbi::Mode; use linderacore::LinderaResult;
fn main() -> LinderaResult<()> { // create tokenizer let config = TokenizerConfig { userdictpath: Some(PathBuf::from("./resources/userdic.csv")), mode: Mode::Normal, ..TokenizerConfig::default() }; let tokenizer = Tokenizer::with_config(config)?;
// tokenize the text
let tokens = tokenizer.tokenize("東京スカイツリーの最寄り駅はとうきょうスカイツリー駅です")?;
// output the tokens
for token in tokens {
println!("{}", token.text);
}
Ok(())
} ```
The above example can be by cargo run --example
:
shell
% cargo run --features=ipadic --example=userdic_example
東京スカイツリー
の
最寄り駅
は
とうきょうスカイツリー駅
です
The API reference is available. Please see following URL: - lindera