Automatically generate type safe Recursive Descent Parsers (RDP) from Rust structures representing an Abstract Syntax Tree (AST).
WARNING: Astray is not ready for production yet. Some features are missing and the documentation is very incomplete. I am working hard on these and will deliver on them as soon as possible.
An AST is, in essence, a tree that represents hierarchical relationships between concepts. An AST has a root, which represents the most encompassing structure in a syntax definition. The root is connected to nodes, and the nodes to other nodes, and so on. The end-nodes that do not link to any nodes but their parents are often called leafs.
struct
or an enum
struct
represents a SN that is a composition of its children. enum
represents a SN that could branch to either of its childrenBelow, an example of an AST defined in these terms. In this case, it represents the following syntax:
- A Program contains many Functions. Program is the root of our AST.
- A Function has a return type, no arguments, and a function body, which consists of a many of Statements.
- A return type can either be an int
keyword or a float
keyword.
- A Statement can be either a ReturnStatement or something else we might define later.
- A ReturnStatement is a return
keyword followed by an Expression.
- An expression is just a literal integer.
- The int, return and float keywords, as well as Identifier are leafs. They do not branch to other SN, but rather contain a Token.
- In the end, we define the Token enum, which represents the smalles parsable thing in our syntax. They are the building blocks of the AST
```rust
struct Program {
function: Vec
struct Function {
return_type: Type,
identifier: Identifier,
parens: (LParen, RParen)
body: Vec
enum Type { Int(KwInt), Float(KwFloat), }
enum Statement { ReturnStatement(ReturnStatement), // ... }
struct ReturnStatement { kw_return: KwReturn, // keyword return expr: Expr, }
struct Expr { expr: LiteralInt }
// Identifier, KwInt, KwFloat and KwReturn are all Leafs. // They are the bottom of the item hierarchy struct Identifier { value: Token }
struct KwReturn{ value: Token }
struct KwInt{ value: Token }
struct KwFloat{ value: Token }
struct LiteralInt { value: Token }
enum Token { KwReturn, KwInt, KwFloat, LiteralInt(u32), Identifier(String) }
// ... ```
Now that we have defined the types that represent our AST, we need to build a parser function that takes a list of tokens and correctly assembles the tree. So, we want to take something like this: "int func() { return 2;}" and parse it into a Program.
The traditional way of doing this would be to build a Recursive Descent Parser. This might sound daunting and hard: it is. However, we don't need to go that far thanks to Astray. By annotating Rust items that represent SNs we can use Astray to automatically generate typesafe parsing functions for each SN!
Token
type that represents each building block of your AST.#[derive(AstNode)]
, and then #[token(<token>)]
, where token
is the Token
type you defined. It can have any name.#[leaf(<token>,<token_instance>)]
, where token
is the Token
type you defined in step 1 and token_instance
is the specific instance of the Token
type that you are expecting this leaf to contain.Token
s and call ::parse(&mut iter)
on the top level type you defined. For the previous example, it would be Program::parse(&mut iter)
.Result<Program,ParseError<Program>>
. If your tokens matched the specification of AST you gave, you'll have Program struct correctly parsed.For more examples, take a look at the tests folder. In general, tests will be more accurate than any future documentation, since they are checked for errors by the compiler, contrary to markdown files.
There is much more to Astray than this! I'll document it as soon as possible.
```rust fn main(){ let tokens = vec![ Token::Identifier("var1".to_string()), Token::EqualSign, Token::LiteralInt(2), ]
let result = AssignStatement::parse(&mut tokens.into_token_iter());
match result {
Ok(assign_statement) => println!("Assign statement was successfully parsed"),
Ok(parse_err) => println!("There was a parsing error {err}"),
}
}
enum Token { Identifier(String), EqualSign, LiteralInt(u32), }
struct AssignStatement { ident: Identifier, eq: EqualSign, literal_int: LiteralInt }
struct Identifier{ value: Token }
struct LiteralInt{ value: Token }
struct EqualSign{ value: Token }
```