rustlr

LR-Style Parser Generator

A Tutorial with several examples is available.

Besides traditional LR and LALR parser generation, Rustlr supports the following options

  1. An experimental feature that generates parsers for Selective Marcus-Leermakers grammars. This is a larger class of unambiguous grammars than traditional LR and helps to allow new productions to be added to a grammar without creating conflicts (see the Appendix of the tutorial).
  2. The option of creating the abstract syntax data types and semantic actions from the grammar. Rustlr grammars contain a sub-language that controls how ASTs are to be generated.
  3. Support for choosing bumpalo to create recursive ASTs that use references instead of smart pointers: this enables deep pattern matching on recursive structures.
  4. Recognizes regex-style operators *, + and ?, which simplify the writing of grammars and allow better ASTs to be created.
  5. Generates a lexical scanner automatically from the grammar.
  6. Operator precedence and associativity declarations further allow grammars to be written that's closer to EBNF syntax.
  7. The ability to train the parser, interactively or from script, for better error reporting.
  8. Generates parsers for Rust and for F#. Rustlr is designed to promote typed functional programming languages in the creation of compilers and language-analysis tools. Parser generation for other such languages will gradually become available.

Quick Example: Arithmetic Expressions and Their Abstract Syntax

The following are the contents of a Rustlr grammar file, simplecalc.grammar: ``` auto terminals + * - / ( )

defines terminals with values and how to extract values from tokens:

valueterminal VAL ~ i32 ~ Num(n) ~ n as i32
nonterminal E nonterminal T : E nonterminal F : E startsymbol E variant-group Operator + - * /

production rules:

E --> E + T | E - T | T T --> T * F | T / F | F F:Neg --> - F F:Val --> VAL F --> ( E )

The following lines are injected verbatim into the parser

!mod simplecalcast; !fn main() { ! let mut scanner1 = simplecalclexer::fromstr("3+2*4"); ! let mut parser1 = makeparser(); ! let parseresult = parsewith(&mut parser1, &mut scanner1); ! let ast = ! parseresult. ! unwraporelse(|x| { ! println!("Parsing errors encountered; results not guaranteed.."); ! x ! }); ! println!("\nAST: {:?}\n",&ast); !}//main ```

In addition to a parser, the grammar generates a lexical scanner from the declarations of terminal symbols. It also created the following abstract syntax type and the semantic actions that produce instances of the type. ```

[derive(Debug)]

pub enum E { Operator(&'static str,LBox,LBox), Neg(LBox), Val(i32), ENothing, } impl Default for E { fn default()->Self { E::ENothing } } `` The form of the AST type(s) was determined by additional declarations within the grammar, includingvariant-groupand the labels given to left-hand side non-terminal symbols (NegandVal). Thevariant-groupdeclaration combined what would-have-been four enum variants into a single "Operator" variant. The enum variants generated from the productions forTandFare merged into the type forEby the declarationsnonterminal T : Eandnonterminal F : E. Specifying operator precedence and associativity instead of using theT andF` categories is also supported.

Rustlr contains a custom smart pointer, LBox, that automatically contains the line and column position of the start of the AST construct in the original source. This information is usually required beyond the parsing stage.

Rustlr AST types implement the Default trait so that a partial result is always returned even when parse errors are encountered.

Automatically generated AST types and semantic actions can always be manually overridden. A mixed approach is also possible.

As this is a quick example, we've also injected a main that demonstrates how to invoke the parser directly into the generated parser file. To run this example,

  1. Install rustlr as a command-line application: cargo install rustlr
  2. Create a Cargo crate with at least rustlr = "0.4" in its dependencies (cargo add rustlr)
  3. save the grammar in the crate as simplecalc.grammar. The filename determines the names of the modules created, and must have a .grammar suffix.
  4. Run rustlr in the crate with rustlr simplecalc.grammar -o src/main.rs
  5. cargo run

The expected output is ``` AST: Operator("+", Val(3), Operator("*", Val(2), Val(4)))

```

Please consult the tutorial for further documentation.