char-ranges

Latest Version Docs License

Similar to the standard library's [.char_indicies()], but instead of only producing the start byte position. This library implements [.char_ranges()], that produce both the start and end byte positions.

Note that simply using [.char_indicies()] and creating a range by mapping the returned index i to i..(i + 1) is not guaranteed to be valid. Given that some UTF-8 characters can be up to 4 bytes.

| Char | Bytes | Range | | :---: | :---: | :----: | | 'O' | 1 | 0..1 | | 'Ø' | 2 | 0..2 | | '∈' | 3 | 0..3 | | '🌏' | 4 | 0..4 |

Assumes encoded in UTF-8.

Example

```rust use char_ranges::CharRangesExt;

let text = "Hello 🗻∈🌏";

let mut chars = text.charranges(); asserteq!(chars.as_str(), "Hello 🗻∈🌏");

asserteq!(chars.next(), Some((0..1, 'H'))); // These chars are 1 byte asserteq!(chars.next(), Some((1..2, 'e'))); asserteq!(chars.next(), Some((2..3, 'l'))); asserteq!(chars.next(), Some((3..4, 'l'))); asserteq!(chars.next(), Some((4..5, 'o'))); asserteq!(chars.next(), Some((5..6, ' ')));

// Get the remaining substring asserteq!(chars.asstr(), "🗻∈🌏");

asserteq!(chars.next(), Some((6..10, '🗻'))); // This char is 4 bytes asserteq!(chars.next(), Some((10..13, '∈'))); // This char is 3 bytes asserteq!(chars.next(), Some((13..17, '🌏'))); // This char is 4 bytes asserteq!(chars.next(), None); ```

Example - DoubleEndedIterator

[CharRanges] also implements [DoubleEndedIterator] making it possible to iterate backwards.

```rust use char_ranges::CharRangesExt;

let text = "ABCDE";

let mut chars = text.charranges(); asserteq!(chars.as_str(), "ABCDE");

asserteq!(chars.next(), Some((0..1, 'A'))); asserteq!(chars.nextback(), Some((4..5, 'E'))); asserteq!(chars.as_str(), "BCD");

asserteq!(chars.nextback(), Some((3..4, 'D'))); asserteq!(chars.next(), Some((1..2, 'B'))); asserteq!(chars.as_str(), "C");

asserteq!(chars.next(), Some((2..3, 'C'))); asserteq!(chars.as_str(), "");

assert_eq!(chars.next(), None); ```