icu_segmenter crates.io

🚧 [Experimental] Segment strings by lines, graphemes, word, and sentences.

This module is published as its own crate (icu_segmenter) and as part of the icu crate. See the latter for more details on the ICU4X project.

This module contains segmenter implementation for the following rules.

🚧 This code is experimental; it may change at any time, in breaking or non-breaking ways, including in SemVer minor releases. It can be enabled with the "experimental" feature of the icu meta-crate. Use with caution. #2259

Examples

Line Break

Segment a string with default options:

```rust use icu::segmenter::LineBreakSegmenter;

let segmenter = LineBreakSegmenter::trynewunstable(&icu_testdata::unstable()) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentstr("Hello World").collect(); asserteq!(&breakpoints, &[6, 11]); ```

See [LineBreakSegmenter] for more examples.

Grapheme Cluster Break

See [GraphemeClusterBreakSegmenter] for examples.

Word Break

Segment a string:

```rust use icu::segmenter::WordBreakSegmenter;

let segmenter = WordBreakSegmenter::trynewunstable(&icu_testdata::unstable()) .expect("Data exists");

let breakpoints: Vec = segmenter.segmentstr("Hello World").collect(); asserteq!(&breakpoints, &[0, 5, 6, 11]); ```

See [WordBreakSegmenter] for more examples.

Sentence Break

See [SentenceBreakSegmenter] for examples.

More Information

For more information on development, authorship, contributing etc. please visit ICU4X home page.