vaporetto_rules

Vaporetto is a fast and lightweight pointwise prediction based tokenizer. vaporetto_rules is rule-base filters for Vaporetto.

Examples

```rust use std::fs::File; use std::io::BufReader; use std::rc::Rc;

use vaporetto::{CharacterType, Model, Predictor, Sentence}; use vaporettorules::{ SentenceFilter, StringFilter, sentencefilters::{ConcatGraphemeClustersFilter, KyteaWsConstFilter}, string_filters::KyteaFullwidthFilter, };

let mut f = BufReader::new(File::open("model.bin").unwrap()); let model = Model::read(&mut f).unwrap(); let mut predictor = Predictor::new(model, false).unwrap();

let prefilters: Vec>> = vec![ Box::new(KyteaFullwidthFilter), ]; let postfilters: Vec> = vec![ Box::new(ConcatGraphemeClustersFilter), Box::new(KyteaWsConstFilter::new(CharacterType::Digit)), ];

let input = "Vaporettoは仲良し家族👨‍👨‍👧‍👦を離れ離れにさせません。" .to_string();

let preprocinput = prefilters.iter().fold(input, |s, filter| filter.filter(s));

let mut sentence = Sentence::fromraw(preprocinput).unwrap(); predictor.predict(&mut sentence);

postfilters.iter().foreach(|filter| filter.filter(&mut sentence));

let mut buf = String::new(); sentence.writetokenizedtext(&mut buf); assert_eq!( "Vaporetto は 仲良 し 家族 👨‍👨‍👧‍👦 を 離れ離れ に さ せ ま せ ん 。", buf, ); ```

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.