This crate provides multiple tokenizers built on top of Scanner
.
rust
let grammar = r#"
expr := expr ('+'|'-') term | term ;
term := term ('*'|'/') factor | factor ;
factor := '-' factor | power ;
power := ufact '^' factor | ufact ;
ufact := ufact '!' | group ;
group := num | '(' expr ')' ;
"#;
let mut tok = EbnfTokenizer::new(grammar.chars())
rust
LispTokenizer::new("(+ 3 4 5)".chars());
MathToken
tokens.
rust
MathTokenizer::new("3.4e-2 * sin(x)/(7! % -4)".chars());
Scanner
is the building block for implementing tokenizers. You can build one from an Iterator and use it to extract tokens. Check the above mentioned tokenizers for examples.
```rust
// Define a Tokenizer
struct Tokenizer
impl
fn tokenizer
// Use it to tokenize a math expression let mut lx = tokenizer("3+4*2/-(1-5)^2^3".chars()); let token = lex.next(); ```
scan_X
functions try to consume some text-object out of the scanner. For example numbers, identifiers, quoted strings, etc.
buffer_pos
and set_buffer_pos
are used for back-tracking as long as the Scanner's buffer still has the data you need. That means you haven't consumed or discarded it.