Similar to the standard library's [.char_indicies()
], but instead of only
producing the start byte position. This library implements [.char_ranges()
],
that produce both the start and end byte positions.
If the input text
is a substring of some original text, and the produced
ranges are desired to be offset in relation to the substring. Then instead
of [.char_ranges()
] use [.charrangesoffset](offset)
or .[char_ranges]().[offset](offset)
.
Note that simply using [.char_indicies()
] and creating a range by mapping the
returned index i
to i..(i + 1)
is not guaranteed to be valid. Given that
some UTF-8 characters can be up to 4 bytes.
| Char | Bytes | Range |
| :---: | :---: | :----: |
| 'O'
| 1 | 0..1
|
| 'Ø'
| 2 | 0..2
|
| '∈'
| 3 | 0..3
|
| '🌏'
| 4 | 0..4
|
Assumes encoded in UTF-8.
```rust use char_ranges::CharRangesExt;
let text = "Hello 🗻∈🌏";
let mut chars = text.charranges(); asserteq!(chars.as_str(), "Hello 🗻∈🌏");
asserteq!(chars.next(), Some((0..1, 'H'))); // These chars are 1 byte asserteq!(chars.next(), Some((1..2, 'e'))); asserteq!(chars.next(), Some((2..3, 'l'))); asserteq!(chars.next(), Some((3..4, 'l'))); asserteq!(chars.next(), Some((4..5, 'o'))); asserteq!(chars.next(), Some((5..6, ' ')));
// Get the remaining substring asserteq!(chars.asstr(), "🗻∈🌏");
asserteq!(chars.next(), Some((6..10, '🗻'))); // This char is 4 bytes asserteq!(chars.next(), Some((10..13, '∈'))); // This char is 3 bytes asserteq!(chars.next(), Some((13..17, '🌏'))); // This char is 4 bytes asserteq!(chars.next(), None); ```
DoubleEndedIterator
[CharRanges
] also implements [DoubleEndedIterator
] making it possible to iterate backwards.
```rust use char_ranges::CharRangesExt;
let text = "ABCDE";
let mut chars = text.charranges(); asserteq!(chars.as_str(), "ABCDE");
asserteq!(chars.next(), Some((0..1, 'A'))); asserteq!(chars.nextback(), Some((4..5, 'E'))); asserteq!(chars.as_str(), "BCD");
asserteq!(chars.nextback(), Some((3..4, 'D'))); asserteq!(chars.next(), Some((1..2, 'B'))); asserteq!(chars.as_str(), "C");
asserteq!(chars.next(), Some((2..3, 'C'))); asserteq!(chars.as_str(), "");
assert_eq!(chars.next(), None); ```
If the input text
is a substring of some original text, and the produced
ranges are desired to be offset in relation to the substring. Then instead
of [.char_ranges()
] use [.charrangesoffset](offset)
or .[char_ranges]().[offset](offset)
.
```rust use char_ranges::CharRangesExt;
let text = "Hello 👋 World 🌏";
let start = 11; // Start index of 'W' let text = &text[start..]; // "World 🌏"
let mut chars = text.charrangesoffset(start); // or // let mut chars = text.char_ranges().offset(start);
asserteq!(chars.next(), Some((11..12, 'W'))); // These chars are 1 byte asserteq!(chars.next(), Some((12..13, 'o'))); assert_eq!(chars.next(), Some((13..14, 'r')));
asserteq!(chars.nextback(), Some((17..21, '🌏'))); // This char is 4 bytes ```