Vaporetto

Vaporetto is a fast and lightweight pointwise prediction based tokenizer.

Examples

```rust use std::fs::File;

use vaporetto::{Model, Predictor, Sentence};

let f = File::open("../resources/model.bin").unwrap(); let model = Model::read(f).unwrap(); let predictor = Predictor::new(model, true).unwrap();

let mut buf = String::new();

let mut s = Sentence::default();

s.updateraw("まぁ社長は火星猫だ").unwrap(); predictor.predict(&mut s); s.filltags(); s.writetokenizedtext(&mut buf); assert_eq!( "まぁ/名詞/マー社長/名詞/シャチョーは/助詞/ワ火星/名詞/カセー猫/名詞/ネコだ/助動詞/ダ", buf, );

s.updateraw("まぁ良いだろう").unwrap(); predictor.predict(&mut s); s.filltags(); s.writetokenizedtext(&mut buf); assert_eq!( "まぁ/副詞/マー良い/形容詞/ヨイだろう/助動詞/ダロー", buf, ); ```

Feature flags

The following features are disabled by default:

kytea - Enables the reader for models generated by KyTea.
train - Enables the trainer.
portable-simd - Uses the portable SIMD API instead of our SIMD-conscious data layout. (Nightly Rust is required.)

The following features are enabled by default:

std - Uses the standard library. If disabled, it uses the core library instead.
cache-type-score - Enables caching type scores for faster processing. If disabled, type scores are calculated in a straightforward manner.
fix-weight-length - Uses fixed-size arrays for storing scores to facilitate optimization. If disabled, vectors are used instead.
tag-prediction - Enables tag prediction.
charwise-pma - Uses the Charwise Daachorse instead of the standard version for faster prediction, although it can make to load a model file slower.

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.