Vaporetto is a fast and lightweight pointwise prediction based tokenizer.
```rust use std::fs::File; use std::io::{prelude::*, stdin, BufReader};
use vaporetto::{Model, Predictor, Sentence};
let mut f = BufReader::new(File::open("model.raw").unwrap()); let model = Model::read(&mut f).unwrap(); let predictor = Predictor::new(model);
let s = Sentence::from_raw("火星猫の生態").unwrap(); let s = predictor.predict(s);
println!("{:?}", s.totokenizedvec().unwrap()); // ["火星", "猫", "の", "生態"] ```
kytea
- Enables the reader for models generated by KyTea.train
- Enables the trainer.portable-simd
- Uses the portable SIMD API instead
of our SIMD-conscious data layout. (Nightly Rust is required.)Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.