malk-lexer

A unicode lexer for use as a first-pass when writing a parser.

The main function exported by this library is lex which takes a &str and a table of valid symbols and converts them to a token tree.

The kinds of token recognized by the lexer are: * Idents: A string starting with a XID_Start character followed by a sequence of XID_Continue characters. * Whitespace: Any sequence of whitespace characters. * Brackets: Any bracket character, it's corresponding closing bracket and the tokens in-between returned as a sub-tree. * Symbols: Any string that appears in the symbol table provided to lex * Strings: A string enclosed with either " or ' and which may contain escaped characters.

Patches welcome!