** Intro
langlang is a parser generator based on [[https://en.wikipedia.org/wiki/Parsingexpressiongrammar][Parsing Expression Grammars]]
** Usage
Provide an input grammar and an input to be parsed with the grammar.
Let's look at an example in which the data to be parsed is in a form of comma separated values. Here's the simplest expression that could parse input in such format:
File <- Line* Line <- Val (',' Val)* '\n' Val <- (![,\n] .)*
If the above grammar is fed with the following input:
c1,c2 10,20 30,40
This is the output returned
File { Line { Val { "c" "1" } "," Val { "c" "2" } "\n" } Line { Val { "1" "0" } "," Val { "2" "0" } "\n" } Line { Val { "3" "0" } "," Val { "4" "0" } "\n" } }
** Line by line
Parsing expression grammars are interpreted top-down, and left to right. The identifiers before the left arrow are called rules or productions, and at the right side of the arrow are the expressions. These expressions borrow a whole lot from [[https://en.wikipedia.org/wiki/Regular_expression][Regular Expressions]].
* File
File <- Line*
The STAR (~~) operator for once, has the exact same semantics. It is going to try to match the expression ~Line~ *one or more times. The identifiers in the expression side are how productions call other productions. Notice that ~File~ is the first production to be called because it is the first one to appear in the input.
* Line
Line <- Val (',' Val)* '\n'
Both ~File~ and ~Line~ productions is the STAR operator and call out to other productions. ~Line~ introduces the use of parenthesizing that intuitively will try to match the COMMA (~,~) character followed by a ~Val~ call one or more times. And it has to end with the NEWLINE (~\n~) escape char.
* Var
Val <- (![,\n] .)*
The production ~Val~ demonstrates another similarity with Regular Expressions in the usage of the Char class selector (~[]~). That same selector also takes ranges (e.g.: ~[0-9]~, ~[a-zA-Z]~, etc). It also demonstrates the use of the ANY (~.~) matcher, that succeeds on any input, and only fails if matched against ~EOF~.
But this same production also includes the operator NOT (~!~) that, although may be syntactically similar to the one in Regular Expressions, its meaning is significantly different in Parsing Expression Grammars. The NOT (~!~) operator has a very special property: it doesn't consume input any input, even when it succeeds. So, the use of the NOT operator is followed with something that will actually consume the input. In the above case, it the expression will match anything that isn't either a COMMA (~,~) or a NEWLINE (~\n~).