A unicode lexer for use as a first-pass when writing a parser.
The main function exported by this library is lex
which takes a &str
and a
table of valid symbols and converts them to a token tree.
The kinds of token recognized by the lexer are:
* Idents: A string starting with a XID_Start
character followed by a
sequence of XID_Continue
characters.
* Whitespace: Any sequence of whitespace characters.
* Brackets: Any bracket character, it's corresponding closing bracket and
the tokens in-between returned as a sub-tree.
* Symbols: Any string that appears in the symbol table provided to lex
* Strings: A string enclosed with either "
or '
and which may contain
escaped characters.
Patches welcome!