icu_segmenter crates.io

A segmenter implementation for the following rules.

Examples

Line Break

Segment a string with default options:

```rust use icu_segmenter::LineBreakSegmenter;

let provider = icutestdata::getprovider(); let segmenter = LineBreakSegmenter::try_new(&provider) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentstr("Hello World").collect(); asserteq!(&breakpoints, &[6, 11]); ```

Segment a string with CSS option overrides:

```rust use icu_segmenter::{LineBreakSegmenter, LineBreakOptions, LineBreakRule, WordBreakRule};

let mut options = LineBreakOptions::default(); options.linebreakrule = LineBreakRule::Strict; options.wordbreakrule = WordBreakRule::BreakAll; options.jazh = false; let provider = icutestdata::getprovider(); let segmenter = LineBreakSegmenter::trynewwithoptions(&provider, options) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentstr("Hello World").collect(); asserteq!(&breakpoints, &[1, 2, 3, 4, 6, 7, 8, 9, 10, 11]); ```

Segment a Latin1 byte string:

```rust use icu_segmenter::LineBreakSegmenter;

let provider = icutestdata::getprovider(); let segmenter = LineBreakSegmenter::try_new(&provider) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentlatin1(b"Hello World").collect(); asserteq!(&breakpoints, &[6, 11]); ```

Grapheme Cluster Break

Segment a string:

```rust use icusegmenter::GraphemeClusterBreakSegmenter; let provider = icutestdata::getprovider(); let segmenter = GraphemeClusterBreakSegmenter::trynew(&provider) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentstr("Hello 🗺").collect(); // World Map (U+1F5FA) is encoded in four bytes in UTF-8. asserteq!(&breakpoints, &[0, 1, 2, 3, 4, 5, 6, 10]); ```

Segment a Latin1 byte string:

```rust use icusegmenter::GraphemeClusterBreakSegmenter; let provider = icutestdata::getprovider(); let segmenter = GraphemeClusterBreakSegmenter::trynew(&provider) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentlatin1(b"Hello World").collect(); asserteq!(&breakpoints, &[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]); ```

Word Break

Segment a string:

```rust use icusegmenter::WordBreakSegmenter; let provider = icutestdata::getprovider(); let segmenter = WordBreakSegmenter::trynew(&provider) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentstr("Hello World").collect(); asserteq!(&breakpoints, &[0, 5, 6, 11]); ```

Segment a Latin1 byte string:

```rust use icusegmenter::WordBreakSegmenter; let provider = icutestdata::getprovider(); let segmenter = WordBreakSegmenter::trynew(&provider) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentlatin1(b"Hello World").collect(); asserteq!(&breakpoints, &[0, 5, 6, 11]); ```

Sentence Break

Segment a string:

```rust use icusegmenter::SentenceBreakSegmenter; let provider = icutestdata::getprovider(); let segmenter = SentenceBreakSegmenter::trynew(&provider) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentstr("Hello World").collect(); asserteq!(&breakpoints, &[0, 11]); ```

Segment a Latin1 byte string:

```rust use icusegmenter::SentenceBreakSegmenter; let provider = icutestdata::getprovider(); let segmenter = SentenceBreakSegmenter::trynew(&provider) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentlatin1(b"Hello World").collect(); asserteq!(&breakpoints, &[0, 11]); ```

More Information

For more information on development, authorship, contributing etc. please visit ICU4X home page.