LR-Style Parser Generator
A Tutorial with several examples is available.
Besides traditional LR and LALR parser generation, Rustlr supports the following options
*
, +
and ?
, which simplify
the writing of grammars and allow better ASTs to be created.Rustlr aims to simplify the creation of precise and efficient parsers and will continue to evolve and incorporate new features, though backwards compatibility will be maintained as much as possible.
The following are the contents of a Rustlr grammar, simplecalc.grammar
:
```
auto
terminals + * - / ( ) # verbatim terminal symbols
valterminal Int i32 # terminal symbol with value
nonterminal E
nonterminal T : E # specifies that AST for T should merge into E
nonterminal F : E
startsymbol E
variant-group BinaryOp + - * / # simplifies AST enum by combining variants
E --> E + T | E - T | T T --> T * F | T / F | F F:Neg --> - F # 'Neg' names enum variant in AST F --> Int | ( E )
!mod simplecalcast; // !-lines are injected verbatim into the parser !fn main() { ! let mut scanner1 = simplecalclexer::fromstr("10+-2*4"); ! let mut parser1 = makeparser(); ! let parseresult = parsewith(&mut parser1, &mut scanner1); ! let ast = ! parseresult. ! unwraporelse(|x| { ! println!("Parsing errors encountered; results not guaranteed.."); ! x ! }); ! println!("\nAST: {:?}\n",&ast); !}//main ```
In addition to a parser, the grammar generates a lexical scanner from the declarations of terminal symbols. It also created the following abstract syntax type and the semantic actions that produce instances of the type. ```
pub enum E {
BinaryOp(&'static str,LBox
The form of the AST type(s) was determined by additional declarations
within the grammar. An enum is normally generated for each
non-terminal with multiple productions, with a variant for each
production. However, the enum variants generated from the productions
for
Tand
Fare merged into the type for
Eby the declarations
nonterminal T : Eand
nonterminal F : E. The
variant-group
declaration combined what would-have-been four variants into one. The
Neg` label on the unary minus rule separates that case from the
"BinaryOp" variant group.
LBox is a custom smart pointer that automatically contains the line and column positions of the start of the AST construct in the original source. This information is usually required beyond the parsing stage.
Rustlr AST types implement the Default trait so that a partial result is always returned even when parse errors are encountered.
Automatically generated AST types and semantic actions can always be manually overridden.
Specifying operator precedence and associativity instead of using the
T
and F
categories is also supported.
The generated parser and lexer normally form a separate module. However,
as this is a quick example, we've injected a main
directly into the parser
file to demonstrate how to invoke the parser.
To run this example,
cargo install rustlr
cargo add rustlr
inside the cratesimplecalc.grammar
.
The filename determines the names of the modules created, and must
have a .grammar
suffix.rustlr simplecalc.grammar -o src/main.rs
cargo run
The expected output is
AST: BinaryOp("+", Int(10), BinaryOp("*", Neg(Int(2)), Int(4)))
Given a parser instance parser
, it's now possible to call
parser1.set_err_report(true)
, which will log parse errors internally
instead of printing them to stderr. The error report can be retrieved
by calling parser1.get_err_report()
.
If the rustlr executable is given a file path that ends in ".y", it will attempt to convert a yacc/bison style grammar into rustlr's own grammar syntax, stripping away all semantic actions and other language-specific content. All other command-line options are ignored.
Please consult the tutorial for further documentation.