LR-Style Parser Generator
A Tutorial with several examples is available.
Besides traditional LR and LALR parser generation, Rustlr supports the following options
*
, +
and ?
, which simplify
the writing of grammars and allow better ASTs to be created.Rustlr aims to simplify the creation of precise and efficient parsers and will continue to evolve and incorporate new features, though backwards compatibility will be maintained as much as possible.
The following are the contents of a Rustlr grammar, simplecalc.grammar
:
```
auto
terminals + * - / ; ( ) # verbatim terminal symbols
valterminal Int i32 # terminal symbol with value
nonterminal E
nonterminal T : E # specifies that AST for T should merge into E
nonterminal F : E
nonterminal ExpList
startsymbol ExpList
variant-group-for E BinaryOp + - * / # group operators in AST generation
E --> E + T | E - T | T T --> T * F | T / F | F F:Neg --> - F # 'Neg' names enum variant in AST F --> Int | ( E ) ExpList --> E<;+> ;? # ;-separated list with optional trailing ;
!mod simplecalcast; // !-lines are injected verbatim into the parser
!fn main() {
! let mut scanner1 = simplecalclexer::fromstr("10+-2*4; 9-(4-1)");
! let mut parser1 = makeparser();
! let parseresult = parsewith(&mut parser1, &mut scanner1);
! let ast =
! parseresult.
! unwraporelse(|x| {
! println!("Parsing errors encountered; results not guaranteed..");
! x
! });
! println!("\nAST: {:?}\n",&ast);
!}//main
The grammar recognizes one or more arithmetic expressions separated by
semicolons. In addition to a parser, the grammar generates a lexical
scanner from the declarations of terminal symbols. It also created
the following abstract syntax types and the semantic actions that
produce instances of the types.
pub enum E {
BinaryOp(&'static str,LBox
pub struct ExpList(pub Vec
[LBox](https://docs.rs/rustlr/latest/rustlr/generic_absyn/struct.LBox.html)
and
[LC](https://docs.rs/rustlr/latest/rustlr/generic_absyn/struct.LC.html)
are structures that contain the line and column positions of the start
of the AST constructs in the original source. This information is
automatically inserted into the structures by the parser. LBox
encapsulates a Box and serves as a custom smart pointer while LC
contains the extra information in an exposed tuple. Both
LBox
and
LCimplement
Derefand
DerefMut
Rustlr generates AST types based on the grammar but special
declarations can control the precise structure of these types. A
struct is normally generated for nonterminal symbols with a single
production while an enum is generated for nonterminals with multiple
productions, with a variant for each production. However, the enum
variants generated from the productions for T
and F
are merged
into the type for E
by the declarations nonterminal T : E
and
nonterminal F : E
. The variant-group-for
declaration combined what
would-have-been four variants into one. The Neg
label on the unary
minus rule separates that case from the "BinaryOp" variant group.
Rustlr AST types implement the Default trait so that a partial result is always returned even when parse errors are encountered.
Automatically generated AST types and semantic actions can always be manually overridden.
Specifying operator precedence and associativity instead of using the
T
and F
categories is also supported.
The generated parser and lexer normally form a separate module. However,
as this is a quick example, we've injected a main
directly into the parser
to demonstrate how to invoke the parser.
To run this example,
cargo install rustlr
cargo add rustlr
inside the cratesimplecalc.grammar
.
The filename determines the names of the modules created, and must
have a .grammar
suffix.rustlr simplecalc.grammar -o src/main.rs
cargo run
The expected output is
AST: ExpList([BinaryOp("+", Int(10), BinaryOp("*", Neg(Int(2)), Int(4))), BinaryOp("-", Int(9), BinaryOp("-", Int(4), Int(1)))])
Rustlr can also be invoked from within Rust by calling the rustlr::generate function.
Boxed labels such as [x]
are now represented by LC instead of LBox during
auto-generation.
The wildcard _
token now carries the original text of the token as
its semantic value by default. The variant-group
directive is now
deprecated (though still available) by variant-group-for
.
When called from the rustlr::generate function, rustlr can be made completely silent if given the
-trace 0
option. All reports are logged and returned by the function.
Given a parser instance parser
, it's now possible to call
parser1.set_err_report(true)
, which will log parse errors internally
instead of printing them to stderr. The error report can be retrieved
by calling parser1.get_err_report()
.
If the rustlr executable is given a file path that ends in ".y", it will attempt to convert a yacc/bison style grammar into rustlr's own grammar syntax, stripping away all semantic actions and other language-specific content. All other command-line options are ignored.
Please consult the tutorial for further documentation.