scanlex
implements a simple lexical scanner.
Tokens are returned by repeatedly calling the get
method,
(which will return Token::End
if no tokens are left)
or by iterating over the scanner. They represent numbers, characters, identifiers,
or single/double quoted strings. There is also Token::Error
to
indicate a badly formed token.
This lexical scanner makes some assumptions, such as a number may not be directly followed by a letter, etc. No attempt is made in this version to decode C-style escape codes in strings. All whitespace is ignored. It's intended for processing generic structured data, rather than code.
For example, the string "hello 'dolly' * 42" will be broken into four tokens:
```rust extern crate scanlex; use scanlex::{Scanner,Token};
let mut scan = Scanner::new("iden 'string' * 10"); asserteq!(scan.get(),Token::Iden("iden".tostring())); asserteq!(scan.get(),Token::Str("string".tostring())); asserteq!(scan.get(),Token::Char('*')); asserteq!(scan.get(),Token::Num(10.0)); assert_eq!(scan.get(),Token::End); ```
Scanner
implements Iterator
. If you just wanted to extract the words from
a string, then calling as_iden
repeatedly will do the trick, since it returns
Option<String>
.
rust
let v: Vec<_> = Scanner::new("bonzo 42 dog (cat)")
.filter_map(|t| t.as_iden()).collect();
assert_eq!(v,&["bonzo","dog","cat"]);
By using as_number
you can use this pattern to extract all the numbers out of a
document, ignoring all other structure. The scan.rs
example shows you the tokens
that would be generated by parsing the given string on the commmand-line.
This iterator does not stop at a Token::Error
token; you can then handle them
yourself.
Usually it's important not to ignore structure. Say we have input strings that look like this "(WORD) = NUMBER":
scan.skip_chars("(")?;
let word = scan.get_iden()?;
scan.skip_chars(")=")?;
let num = scan.get_number()?;
This needs to be appropriately wrapped up, because any of these calls may fail!
It is a common pattern to create a scanner for each line of text read from a readable
source. The scanline.rs
example shows how to use ScanLines
to accomplish this.
A more serious example (taken from the tests) is parsing JSON:
```rust
fn scanjson(scan: &mut Scanner) -> Result
}
```
(This is of course an Illustrative Example. JSON is a solved problem.)