char-ranges

Latest Version Docs License

Similar to the standard library's [.char_indicies()], but instead of only producing the start byte position. This library implements [.char_ranges()], that produce both the start and end byte positions.

If the input text is a substring of some original text, and the produced ranges are desired to be offset in relation to the substring. Then instead of [.char_ranges()] use [.charrangesoffset](offset) or .[char_ranges]().[offset](offset).

Note that simply using [.char_indicies()] and creating a range by mapping the returned index i to i..(i + 1) is not guaranteed to be valid. Given that some UTF-8 characters can be up to 4 bytes.

| Char | Bytes | Range | | :---: | :---: | :----: | | 'O' | 1 | 0..1 | | 'Ø' | 2 | 0..2 | | '∈' | 3 | 0..3 | | '🌏' | 4 | 0..4 |

Assumes encoded in UTF-8.

Example

```rust use char_ranges::CharRangesExt;

let text = "Hello 🗻∈🌏";

let mut chars = text.charranges(); asserteq!(chars.as_str(), "Hello 🗻∈🌏");

asserteq!(chars.next(), Some((0..1, 'H'))); // These chars are 1 byte asserteq!(chars.next(), Some((1..2, 'e'))); asserteq!(chars.next(), Some((2..3, 'l'))); asserteq!(chars.next(), Some((3..4, 'l'))); asserteq!(chars.next(), Some((4..5, 'o'))); asserteq!(chars.next(), Some((5..6, ' ')));

// Get the remaining substring asserteq!(chars.asstr(), "🗻∈🌏");

asserteq!(chars.next(), Some((6..10, '🗻'))); // This char is 4 bytes asserteq!(chars.next(), Some((10..13, '∈'))); // This char is 3 bytes asserteq!(chars.next(), Some((13..17, '🌏'))); // This char is 4 bytes asserteq!(chars.next(), None); ```

Example - DoubleEndedIterator

[CharRanges] also implements [DoubleEndedIterator] making it possible to iterate backwards.

```rust use char_ranges::CharRangesExt;

let text = "ABCDE";

let mut chars = text.charranges(); asserteq!(chars.as_str(), "ABCDE");

asserteq!(chars.next(), Some((0..1, 'A'))); asserteq!(chars.nextback(), Some((4..5, 'E'))); asserteq!(chars.as_str(), "BCD");

asserteq!(chars.nextback(), Some((3..4, 'D'))); asserteq!(chars.next(), Some((1..2, 'B'))); asserteq!(chars.as_str(), "C");

asserteq!(chars.next(), Some((2..3, 'C'))); asserteq!(chars.as_str(), "");

assert_eq!(chars.next(), None); ```

Example - Offset Ranges

If the input text is a substring of some original text, and the produced ranges are desired to be offset in relation to the substring. Then instead of [.char_ranges()] use [.charrangesoffset](offset) or .[char_ranges]().[offset](offset).

```rust use char_ranges::CharRangesExt;

let text = "Hello 👋 World 🌏";

let start = 11; // Start index of 'W' let text = &text[start..]; // "World 🌏"

let mut chars = text.charrangesoffset(start); // or // let mut chars = text.char_ranges().offset(start);

asserteq!(chars.next(), Some((11..12, 'W'))); // These chars are 1 byte asserteq!(chars.next(), Some((12..13, 'o'))); assert_eq!(chars.next(), Some((13..14, 'r')));

asserteq!(chars.nextback(), Some((17..21, '🌏'))); // This char is 4 bytes ```