Evolution of... dynparser
Basic execution flow
txt
Text -> Parsing -> Transform -> Text
More info about the peg
syntax bellow.
Add to cargo.toml
```toml [dependencies] dpr = "0.1.0"
```
Wach examples below
txt
0.1.0 First version
Giveng a peg
grammar extended, it will verify the input and can generate an output based on transformation rules
But let's see by examples
Starting with this peg
Peg:
text
main = char+
char = 'a' -> A
/ 'b' -> B
/ .
Given this input
Input:
text
aaacbbabdef
We got as result:
Output:
text
AAAcBBABdef
Addition calculator example
Peg: ```text main = expr
expr = num:num -> PUSH $(num)$(:endl)
(op:op expr:expr)? -> $(expr)EXEC $(op)$(:endl)
op = '+' -> ADD
/ '-' -> SUB
num = [0-9]+ ('.' [0-9])?
```
Input:
text
1+2-3
Output:
text
PUSH 1
PUSH 2
PUSH 3
EXEC SUB
EXEC ADD
Basic text trasnformation flow.
```text
DSL flow
.--------. | peg | | user | '--------' | v .--------. | GEN | | rules | '--------' | .----------. | | input | | | user | | '----------' | | | v | .----------. | | parse | '--------------->| | '----------' | v .---------. | replace | | | '---------' | v .--------. | OUTPUT | | | '--------'
```
The rust
code for first example...
```rust extern crate dpr;
fn main() -> Result<(), dpr::Error> { let result = dpr::Peg::new( " main = char+ char = 'a' -> A / 'b' -> B / . ", ) .gen_rules()? .parse("aaacbbabdef")? .replace()? // ... ;
println!("{:#?}", result);
Ok(())
} ```
You saw some examples, let see in detail
| token | Description |
| ---------- | --------------------------------------------------------------------- |
| =
| On left, symbol, on right expresion defining symbol |
| symbol
| It's an string without quotes, no spaces, and ascii |
| .
| Any char |
| "..."
| Literal delimited by quotes |
| <space>
| Separate tokens and Rule concatenation (and
operation) |
| /
| Or operation |
| (...)
| A expression composed of sub expresions |
| ?
| One optional |
| *
| Repeat 0 or more |
| +
| Repeat 1 or more |
| !
| negate expression, continue if not followed without consume |
| &
| verify it follows..., but not consuming |
| [...]
| Match chars. It's a list or ranges (or both) |
| ->
| after the arrow, we have the transformation rule |
| :
| To give a name, in order to use later in transformation |
| error(...) | This let's you to define an error message when this rule is satisfied |
Below there is the grammar
witch define the valid peg
inputs.
BTW, this grammar
has been parsed to generate the code to parse itself ;-)
Let's see by example
A simple literal string.
peg
main = "Hello world"
Concatenation (and)
peg
main = "Hello " "world"
Referencing symbols
Symbol
peg
main = hi
hi = "Hello world"
Or conditions /
peg
main = "hello" / "hi"
Or multiline
peg
main
= "hello"
/ "hi"
/ "hola"
Or multiline 2
peg
main = "hello"
/ "hi"
/ "hola"
Or disorganized
peg
main = "hello"
/ "hi" / "hola"
Parenthesis
peg
main = ("hello" / "hi") " world"
Just multiline
Multiline1
peg
main
= ("hello" / "hi") " world"
Multiline2
peg
main
= ("hello" / "hi")
" world"
Multiline3
peg
main = ("hello" / "hi")
" world"
It is recomended to use or operator /
on each new line and =
on first line, like
Multiline organized
peg
main = ("hello" / "hi") " world"
/ "bye"
One optional
peg
main = ("hello" / "hi") " world"?
Repetitions
peg
main = one_or_more_a / zero_or_many_b
one_or_more = "a"+
zero_or_many = "b"*
Negation will not move current possition
Next example will consume all chars till get an "a"
Negation
peg
main = (!"a" .)* "a"
Consume till
peg
comment = "//" (!"\n" .)*
/ "/*" (!"*/" .)* "*/"
Match a set of chars. Chars can be defined by range.
```peg number = digit+ ("." digit+)? digit = [0-9] aorb = [ab] id = [a-zA-Z][a-zA-Z0-9]*
aorbordigit = [ab0-9] ```
Simple recursion
one or more "a" recursive
```peg as = "a" as / "a"
// simplified with +
ak = "a"+
```
Recursion to match parentheses
Recursion match par
peg
match_par = "(" match_par ")"
/ "(" ")"
In order to produce custom errors, you have to use error(...)
constructor
In next example, the system will complain with parenthesis error if they are unbalanced
peg
parenth = '(' _ expr _ ( ')'
/ error("unbalanced parethesis: missing ')'")
)
As you can see, if you can run the rule to close properly the parenthesis, everything is OK, in other case, custom error message will be produced
You can set the replace rules with ->
text
op = '+' -> ADD
/ '-' -> SUB
When +
will be found and validated, it will be replaced by ADD
text
expr = num:num -> PUSH $(num)$(:endl)
(op:op expr:expr)? -> $(expr)EXEC $(op)$(:endl)
To refer to parsed chunk, you can name it using :
When refering to a symbol
, you don't need to give a name
Next examples, are equivalent
text
expr = num:num -> PUSH $(num)$(:endl)
(op:op expr:expr)? -> $(expr)EXEC $(op)$(:endl)
text
expr = num -> PUSH $(num)$(:endl)
(op expr)? -> $(expr)EXEC $(op)$(:endl)
The arrow will work with current line. If you need to use trasnsformations
over some lines, you will have to use (...)
There is a grammar to parse the peg grammars that could be an example on file gcode/peg2code.rs
After the arrow, you will have the transformation rule.
Replacing tokens
:
Things inside $(...)
will be replaced.
Text outside it, will be written as it
Replacing tokens
can refer to parsed text by name or by position.
text
-> $(num)
This will look for a name called num
defined on left side to write it on output
Next line will also look for names, but on rep_symbol
will not complain it it doesn't exists
txt
rep_or_unary = atom_or_par rep_symbol? -> $(?rep_symbol)$(atom_or_par)
You can also refer an element by position
text
-> $(.1)
You can also refer to functions
starting the replacing token
with :
text
expr = num -> $(:endl)
Predefined functions are...
(Watch on replace.rs
to see full replace functions)
rust
"endl" => "\n",
"spc" => " ",
"_" => " ",
"tab" => "\t",
"(" => "\t",
// "now" => "pending",
_ => "?unknown_fn?",
Example
text
expr = num -> PUSH $(num)$(:endl)
(op expr)? -> $(.2)EXEC $(.1)$(:endl)
You can define your own functions
(aka external functions
)
In next example we created the replacement token el
```cpp fn main() -> Result<(), dpr::Error> { let result = dpr::Peg::new( " main = char+ char = 'a' -> $(:el)A / 'b' -> $(:el)B / ch:. -> $(:el)$(ch) ", ) .genrules()? .parse("aaacbbabdef")? .replace(Some(&dpr::FnCallBack(customfuntions)))? // ... ;
println!("{:#?}", result);
println!("{}", result.str());
Ok(())
}
fn customfuntions(fntxt: &str) -> Option
What is a parser without an math expresion calculator?
Obiously, it's necessary to consider the operator priority, operator asociativity and parenthesis, and negative numbers and negative expresions
```rust extern crate dpr;
fn main() -> Result<(), dpr::Error> { let result = dpr::Peg::new( r#" main = expr
expr = term (
_ add_op _ term ->$(term)$(add_op)
)*
term = factor (
_ mult_op _ factor ->$(factor)$(mult_op)
)*
factor = pow (
_ pow_op _ subexpr ->$(subexpr)$(pow_op)
)*
pow = subexpr (
_ pow_op _ pow ->$(pow)$(pow_op)
)*
subexpr = '(' _ expr _ ->$(expr)
( ')' ->$(:none)
/ error("parenthesis error")
)
/ number ->PUSH $(number)$(:endl)
/ '-' _ subexpr ->PUSH 0$(:endl)$(subexpr)SUB$(:endl)
number = ([0-9]+ ('.' [0-9])?)
add_op = '+' ->EXEC ADD$(:endl)
/ '-' ->EXEC SUB$(:endl)
mult_op = '*' ->EXEC MUL$(:endl)
/ '/' ->EXEC DIV$(:endl)
pow_op = '^' ->EXEC POW$(:endl)
_ = ' '*
"#,
)
.gen_rules()?
.parse("-(-1+2* 3^5 ^(- 2 ) -7)+8")?
.replace()?
// ...
;
println!("{:#?}", result);
println!("{}", result.str());
Ok(())
} ```
The output is a program for a stack machine, composed of a command with a parameter...
text
PUSH 0
PUSH 0
PUSH 1
EXEC SUB
PUSH 2
PUSH 3
PUSH 5
PUSH 0
PUSH 2
EXEC SUB
EXEC POW
EXEC POW
EXEC MUL
EXEC ADD
PUSH 7
EXEC SUB
EXEC SUB
PUSH 8
EXEC ADD
At the moment it's...
(for an updated reference, open peg2code.rs file :-)
```txt fn text_peg2code() -> &'static str { r#" /* A peg grammar to parse peg grammars * */
main = grammar -> $(grammar)EOP
grammar = rule+
symbol = [_a-zA-Z0-9] [_'"a-zA-Z0-9]*
rule = _ rule_name _ '=' _ expr _eol _ -> RULE$(:endl)$(rule_name)$(:endl)$(expr)
rule_name = symbol
expr = or -> OR$(:endl)$(or)CLOSE_MEXPR$(:endl)
or = _ and -> AND$(:endl)$(and)CLOSE_MEXPR$(:endl)
( _ '/' _ or )? -> $(or)
and = error
/ (andline transf2 and:(
_ ->$(:none)
!(rule_name _ ('=' / '{')) and )?) -> TRANSF2$(:endl)$(transf2)EOTRANSF2$(:endl)AND$(:endl)$(andline)CLOSE_MEXPR$(:endl)$(and)
/ andline (
( ' ' / comment )* eol+ _ -> $(:none)
!( rule_name _ ('=' / '{') ) and
)?
error = 'error' _ '(' _ literal _ ')' -> ERROR$(:endl)$(literal)$(:endl)
andline = andchunk (
' '+ ->$(:none)
( error / andchunk )
)*
andchunk = name e:rep_or_unary -> NAMED$(:endl)$(name)$(:endl)$(e)
/ rep_or_unary
// this is the and separator
_1 = ' ' / eol -> $(:none)
// repetitions or unary operator
rep_or_unary = atom_or_par rep_symbol? -> $(?rep_symbol)$(atom_or_par)
// atom_or_par -> $(atom_or_par)
/ '!' atom_or_par -> NEGATE$(:endl)$(atom_or_par)
/ '&' atom_or_par -> PEEK$(:endl)$(atom_or_par)
rep_symbol = '*' -> REPEAT$(:endl)0$(:endl)inf$(:endl)
/ '+' -> REPEAT$(:endl)1$(:endl)inf$(:endl)
/ '?' -> REPEAT$(:endl)0$(:endl)1$(:endl)
atom_or_par = atom / parenth
parenth = '(' _ expr _ -> $(expr)
( ')' -> $(:none)
/ error("unbalanced parethesis: missing ')'")
)
atom = a:literal -> ATOM$(:endl)LIT$(:endl)$(a)$(:endl)
/ a:match -> MATCH$(:endl)$(a)
/ a:rule_name -> ATOM$(:endl)RULREF$(:endl)$(a)$(:endl)
/ dot -> ATOM$(:endl)DOT$(:endl)
// as rule_name can start with a '.', dot has to be after rule_name
literal = lit_noesc / lit_esc
lit_noesc = _' l:( !_' . )* _' -> $(l)
_' = "'"
lit_esc = (_"
l:( esc_char
/ hex_char
/ !_" .
)*
_") -> $(l)
_" = '"'
esc_char = '\r'
/ '\n'
/ '\t'
/ '\\'
/ '\\"'
hex_char = '\0x' [0-9A-F] [0-9A-F]
eol = "\r\n" / "\n" / "\r"
_eol = (' ' / comment)* eol
match = '[' -> $(:none)
(
mchars b:(mbetween*) -> CHARS$(:endl)$(mchars)$(:endl)BETW$(:endl)$(b)EOBETW$(:endl)
/ b:(mbetween+) -> BETW$(:endl)$(b)EOBETW$(:endl)
)
']' -> $(:none)
mchars = (!']' !(. '-') .)+
mbetween = f:. '-' s:. -> $(f)$(:endl)$(s)$(:endl)
dot = '.'
_ = (
( ' '
/ eol
/ comment
)*
) -> $(:none)
comment = ( line_comment
/ mline_comment
) -> $(:none)
line_comment = '//' (!eol .)*
mline_comment = '/*' (!'*/' .)* '*/'
name = symbol ":" -> $(symbol)
transf2 = _1 _ '->' ' '* -> $(:none)
transf_rule -> $(transf_rule)
&eol
transf_rule = ( tmpl_text / tmpl_rule )+
tmpl_text = t:( (!("$(" / eol) .)+ ) -> TEXT$(:endl)$(t)$(:endl)
tmpl_rule = "$(" -> $(:none)
(
// by name optional
'?' symbol ->NAMED_OPT$(:endl)$(symbol)$(:endl)
// by name
/ symbol ->NAMED$(:endl)$(symbol)$(:endl)
// by pos
/ "." pos:([0-9]+) ->POS$(:endl)$(symbol)$(pos)$(:endl)
// by function
/ ":" ->$(:none)
fn:((!(")" / eol) .)+) ->FUNCT$(:endl)$(fn)$(:endl)
)
")" ->$(:none)
"#
} ```
As you can see, the code to start parsing the peg
input, is written in a text peg
file
How is it possible?
At the moment, the rules_for_peg
code is...
rust
r#"symbol"# => or!(and!(ematch!(chlist r#"_"# , from 'a', to 'z' , from 'A', to 'Z' , from '0', to '9' ), rep!(ematch!(chlist r#"_'""# , from 'a', to 'z' , from 'A', to 'Z' , from '0', to '9' ), 0)))
, r#"transf_rule"# => or!(and!(rep!(or!(and!(ref_rule!(r#"tmpl_text"#)), and!(ref_rule!(r#"tmpl_rule"#))), 1)))
, r#"mbetween"# => or!(and!(transf2!( and!( and!(named!("f", dot!()), lit!("-"), named!("s", dot!())) ) , t2rules!(t2_byname!("f"), t2_funct!("endl"), t2_byname!("s"), t2_funct!("endl"), ) )))
, r#"line_comment"# => or!(and!(lit!("//"), rep!(or!(and!(not!(ref_rule!(r#"eol"#)), dot!())), 0)))
, r#"lit_noesc"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"_'"#), named!("l", rep!(or!(and!(not!(ref_rule!(r#"_'"#)), dot!())), 0)), ref_rule!(r#"_'"#)) ) , t2rules!(t2_byname!("l"), ) )))
, r#"error"# => or!(and!(transf2!( and!( and!(lit!("error"), ref_rule!(r#"_"#), lit!("("), ref_rule!(r#"_"#), ref_rule!(r#"literal"#), ref_rule!(r#"_"#), lit!(")")) ) , t2rules!(t2_text!("ERROR"), t2_funct!("endl"), t2_byname!("literal"), t2_funct!("endl"), ) )))
, r#"atom_or_par"# => or!(and!(ref_rule!(r#"atom"#)), and!(ref_rule!(r#"parenth"#)))
, r#"tmpl_text"# => or!(and!(transf2!( and!( and!(named!("t", or!(and!(rep!(or!(and!(not!(or!(and!(lit!("$(")), and!(ref_rule!(r#"eol"#)))), dot!())), 1))))) ) , t2rules!(t2_text!("TEXT"), t2_funct!("endl"), t2_byname!("t"), t2_funct!("endl"), ) )))
, r#"atom"# => or!(and!(transf2!( and!( and!(named!("a", ref_rule!(r#"literal"#))) ) , t2rules!(t2_text!("ATOM"), t2_funct!("endl"), t2_text!("LIT"), t2_funct!("endl"), t2_byname!("a"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(named!("a", ref_rule!(r#"match"#))) ) , t2rules!(t2_text!("MATCH"), t2_funct!("endl"), t2_byname!("a"), ) )), and!(transf2!( and!( and!(named!("a", ref_rule!(r#"rule_name"#))) ) , t2rules!(t2_text!("ATOM"), t2_funct!("endl"), t2_text!("RULREF"), t2_funct!("endl"), t2_byname!("a"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(ref_rule!(r#"dot"#)) ) , t2rules!(t2_text!("ATOM"), t2_funct!("endl"), t2_text!("DOT"), t2_funct!("endl"), ) )))
, r#"grammar"# => or!(and!(rep!(ref_rule!(r#"rule"#), 1)))
, r#"dot"# => or!(and!(lit!(".")))
, r#"eol"# => or!(and!(lit!("\r\n")), and!(lit!("\n")), and!(lit!("\r")))
, r#"expr"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"or"#)) ) , t2rules!(t2_text!("OR"), t2_funct!("endl"), t2_byname!("or"), t2_text!("CLOSE_MEXPR"), t2_funct!("endl"), ) )))
, r#"name"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"symbol"#), lit!(":")) ) , t2rules!(t2_byname!("symbol"), ) )))
, r#"literal"# => or!(and!(ref_rule!(r#"lit_noesc"#)), and!(ref_rule!(r#"lit_esc"#)))
, r#"rule"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"_"#), ref_rule!(r#"rule_name"#), ref_rule!(r#"_"#), lit!("="), ref_rule!(r#"_"#), ref_rule!(r#"expr"#), ref_rule!(r#"_eol"#), ref_rule!(r#"_"#)) ) , t2rules!(t2_text!("RULE"), t2_funct!("endl"), t2_byname!("rule_name"), t2_funct!("endl"), t2_byname!("expr"), ) )))
, r#"andchunk"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"name"#), named!("e", ref_rule!(r#"rep_or_unary"#))) ) , t2rules!(t2_text!("NAMED"), t2_funct!("endl"), t2_byname!("name"), t2_funct!("endl"), t2_byname!("e"), ) )), and!(ref_rule!(r#"rep_or_unary"#)))
, r#"_""# => or!(and!(lit!("\"")))
, r#"mline_comment"# => or!(and!(lit!("/*"), rep!(or!(and!(not!(lit!("*/")), dot!())), 0), lit!("*/")))
, r#"andline"# => or!(and!(ref_rule!(r#"andchunk"#), rep!(or!(and!(transf2!( and!( and!(rep!(lit!(" "), 1)) ) , t2rules!(t2_funct!("none"), ) ), or!(and!(ref_rule!(r#"error"#)), and!(ref_rule!(r#"andchunk"#))))), 0)))
, r#"hex_char"# => or!(and!(lit!("\0x"), ematch!(chlist r#""# , from '0', to '9' , from 'A', to 'F' ), ematch!(chlist r#""# , from '0', to '9' , from 'A', to 'F' )))
, r#"mchars"# => or!(and!(rep!(or!(and!(not!(lit!("]")), not!(or!(and!(dot!(), lit!("-")))), dot!())), 1)))
, r#"transf2"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"_1"#), ref_rule!(r#"_"#), lit!("->"), rep!(lit!(" "), 0)) ) , t2rules!(t2_funct!("none"), ) ), transf2!( and!( and!(ref_rule!(r#"transf_rule"#)) ) , t2rules!(t2_byname!("transf_rule"), ) ), peek!(ref_rule!(r#"eol"#))))
, r#"_'"# => or!(and!(lit!("'")))
, r#"and"# => or!(and!(ref_rule!(r#"error"#)), and!(transf2!( and!( and!(or!(and!(ref_rule!(r#"andline"#), ref_rule!(r#"transf2"#), named!("and", rep!(or!(and!(transf2!( and!( and!(ref_rule!(r#"_"#)) ) , t2rules!(t2_funct!("none"), ) ), not!(or!(and!(ref_rule!(r#"rule_name"#), ref_rule!(r#"_"#), or!(and!(lit!("=")), and!(lit!("{")))))), ref_rule!(r#"and"#))), 0, 1))))) ) , t2rules!(t2_text!("TRANSF2"), t2_funct!("endl"), t2_byname!("transf2"), t2_text!("EOTRANSF2"), t2_funct!("endl"), t2_text!("AND"), t2_funct!("endl"), t2_byname!("andline"), t2_text!("CLOSE_MEXPR"), t2_funct!("endl"), t2_byname!("and"), ) )), and!(ref_rule!(r#"andline"#), rep!(or!(and!(transf2!( and!( and!(rep!(or!(and!(lit!(" ")), and!(ref_rule!(r#"comment"#))), 0), rep!(ref_rule!(r#"eol"#), 1), ref_rule!(r#"_"#)) ) , t2rules!(t2_funct!("none"), ) ), not!(or!(and!(ref_rule!(r#"rule_name"#), ref_rule!(r#"_"#), or!(and!(lit!("=")), and!(lit!("{")))))), ref_rule!(r#"and"#))), 0, 1)))
, r#"_"# => or!(and!(transf2!( and!( and!(or!(and!(rep!(or!(and!(lit!(" ")), and!(ref_rule!(r#"eol"#)), and!(ref_rule!(r#"comment"#))), 0)))) ) , t2rules!(t2_funct!("none"), ) )))
, r#"or"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"_"#), ref_rule!(r#"and"#)) ) , t2rules!(t2_text!("AND"), t2_funct!("endl"), t2_byname!("and"), t2_text!("CLOSE_MEXPR"), t2_funct!("endl"), ) ), transf2!( and!( and!(rep!(or!(and!(ref_rule!(r#"_"#), lit!("/"), ref_rule!(r#"_"#), ref_rule!(r#"or"#))), 0, 1)) ) , t2rules!(t2_byname!("or"), ) )))
, r#"_1"# => or!(and!(lit!(" ")), and!(transf2!( and!( and!(ref_rule!(r#"eol"#)) ) , t2rules!(t2_funct!("none"), ) )))
, r#"lit_esc"# => or!(and!(transf2!( and!( and!(or!(and!(ref_rule!(r#"_""#), named!("l", rep!(or!(and!(ref_rule!(r#"esc_char"#)), and!(ref_rule!(r#"hex_char"#)), and!(not!(ref_rule!(r#"_""#)), dot!())), 0)), ref_rule!(r#"_""#)))) ) , t2rules!(t2_byname!("l"), ) )))
, r#"rep_symbol"# => or!(and!(transf2!( and!( and!(lit!("*")) ) , t2rules!(t2_text!("REPEAT"), t2_funct!("endl"), t2_text!("0"), t2_funct!("endl"), t2_text!("inf"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(lit!("+")) ) , t2rules!(t2_text!("REPEAT"), t2_funct!("endl"), t2_text!("1"), t2_funct!("endl"), t2_text!("inf"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(lit!("?")) ) , t2rules!(t2_text!("REPEAT"), t2_funct!("endl"), t2_text!("0"), t2_funct!("endl"), t2_text!("1"), t2_funct!("endl"), ) )))
, r#"parenth"# => or!(and!(transf2!( and!( and!(lit!("("), ref_rule!(r#"_"#), ref_rule!(r#"expr"#), ref_rule!(r#"_"#)) ) , t2rules!(t2_byname!("expr"), ) ), or!(and!(transf2!( and!( and!(lit!(")")) ) , t2rules!(t2_funct!("none"), ) )), and!(error!("unbalanced parethesis: missing ')'")))))
, r#"comment"# => or!(and!(transf2!( and!( and!(or!(and!(ref_rule!(r#"line_comment"#)), and!(ref_rule!(r#"mline_comment"#)))) ) , t2rules!(t2_funct!("none"), ) )))
, r#"_eol"# => or!(and!(rep!(or!(and!(lit!(" ")), and!(ref_rule!(r#"comment"#))), 0), ref_rule!(r#"eol"#)))
, r#"esc_char"# => or!(and!(lit!("\r")), and!(lit!("\n")), and!(lit!("\t")), and!(lit!("\\")), and!(lit!("\\\"")))
, r#"match"# => or!(and!(transf2!( and!( and!(lit!("[")) ) , t2rules!(t2_funct!("none"), ) ), or!(and!(transf2!( and!( and!(ref_rule!(r#"mchars"#), named!("b", or!(and!(rep!(ref_rule!(r#"mbetween"#), 0))))) ) , t2rules!(t2_text!("CHARS"), t2_funct!("endl"), t2_byname!("mchars"), t2_funct!("endl"), t2_text!("BETW"), t2_funct!("endl"), t2_byname!("b"), t2_text!("EOBETW"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(named!("b", or!(and!(rep!(ref_rule!(r#"mbetween"#), 1))))) ) , t2rules!(t2_text!("BETW"), t2_funct!("endl"), t2_byname!("b"), t2_text!("EOBETW"), t2_funct!("endl"), ) ))), transf2!( and!( and!(lit!("]")) ) , t2rules!(t2_funct!("none"), ) )))
, r#"rep_or_unary"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"atom_or_par"#), rep!(ref_rule!(r#"rep_symbol"#), 0, 1)) ) , t2rules!(t2_byname_opt!("rep_symbol"), t2_byname!("atom_or_par"), ) )), and!(transf2!( and!( and!(lit!("!"), ref_rule!(r#"atom_or_par"#)) ) , t2rules!(t2_text!("NEGATE"), t2_funct!("endl"), t2_byname!("atom_or_par"), ) )), and!(transf2!( and!( and!(lit!("&"), ref_rule!(r#"atom_or_par"#)) ) , t2rules!(t2_text!("PEEK"), t2_funct!("endl"), t2_byname!("atom_or_par"), ) )))
, r#"tmpl_rule"# => or!(and!(transf2!( and!( and!(lit!("$(")) ) , t2rules!(t2_funct!("none"), ) ), or!(and!(transf2!( and!( and!(lit!("?"), ref_rule!(r#"symbol"#)) ) , t2rules!(t2_text!("NAMED_OPT"), t2_funct!("endl"), t2_byname!("symbol"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(ref_rule!(r#"symbol"#)) ) , t2rules!(t2_text!("NAMED"), t2_funct!("endl"), t2_byname!("symbol"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(lit!("."), named!("pos", or!(and!(rep!(ematch!(chlist r#""# , from '0', to '9' ), 1))))) ) , t2rules!(t2_text!("POS"), t2_funct!("endl"), t2_byname!("symbol"), t2_byname!("pos"), t2_funct!("endl"), ) )), and!(transf2!( and!( and!(lit!(":")) ) , t2rules!(t2_funct!("none"), ) ), transf2!( and!( and!(named!("fn", or!(and!(rep!(or!(and!(not!(or!(and!(lit!(")")), and!(ref_rule!(r#"eol"#)))), dot!())), 1))))) ) , t2rules!(t2_text!("FUNCT"), t2_funct!("endl"), t2_byname!("fn"), t2_funct!("endl"), ) ))), transf2!( and!( and!(lit!(")")) ) , t2rules!(t2_funct!("none"), ) )))
, r#"main"# => or!(and!(transf2!( and!( and!(ref_rule!(r#"grammar"#)) ) , t2rules!(t2_byname!("grammar"), t2_text!("EOP"), ) )))
, r#"rule_name"# => or!(and!(ref_rule!(r#"symbol"#)))
)
}
Writting it by hand, it's dificult.
Isn't this program desineg to receive a text peg
grammar and an text input and produce a text output?
IR
is from Intermediate Representation
Why???
Once we parse the input, we have an AST
.
We could process the AST
but...
The AST
is strongly coupled to the grammar. Most of the times we modify the grammar, we will need to modify the code to process the AST
.
Some times the grammar modification will be a syntax modif, or adding some feature that requiere some syntax modification, therefore a different AST
but all, or almost all of the concepts remain the same.
Imagine if we wanted to add de function sqrt
to the math expresion compiler. We will need to modify the rules generator in order to process the new AST
To decouple the peg
grammar from parsing the AST
, we will create the IR
(Intermediate Representation)
How to get the IR
will be defined in the own peg
grammar as transformation rules.
An interpreter of the IR
will produce the rules in memory. Later, we can generate de rust
code from the rules produced, or we could have a specific interpreter to generate them, but it's nice to get it from rust data structures
To develop this feature... we need a parser, and a code generator... Hey!!! I do it. dpr
does that!!!
How to generate the IR
rust
peg_grammar()
.parse(peg_grammar())
.gen_rules()
.replace()
The peg_grammar
will have in transformation rules
the intructions to generate the IR
Thanks to the IR
it's easy to modify this program, and we don't need to deal with the AST
coupled to the peg-grammar
Creating rules...
```rust extern crate dpr;
fn main() -> Result<(), dpr::Error> { let result = dpr::Peg::new( " main = char+ char = 'a' -> A / 'b' -> B / . ", ) .gen_rules()? // .parse("aaacbbabdef")? // .replace()? // ... ;
println!("{:#?}", result);
Ok(())
} ```
Produce a set of rules like...
text
SetOfRules(
{
"main": And(
MultiExpr(
[
Repeat(
RepInfo {
expression: RuleName(
"char",
),
min: NRep(
1,
),
max: None,
},
),
],
),
),
"char": Or(
MultiExpr(
[
And(
MultiExpr(
[
MetaExpr(
Transf2(
Transf2Expr {
mexpr: MultiExpr(
[
Simple(
Literal(
"a",
),
),
],
),
transf2_rules: "A",
},
),
),
],
),
),
And(
MultiExpr(
[
MetaExpr(
Transf2(
Transf2Expr {
mexpr: MultiExpr(
[
Simple(
Literal(
"b",
),
),
],
),
transf2_rules: "B",
},
),
),
],
),
),
And(
MultiExpr(
[
Simple(
Dot,
),
],
),
),
],
),
),
},
)
This set of rules will let us to parse
and generate the AST
for any input
Next step, parsing
the input
with generated rules
...
Creating rules...
(With a simplified input in order to reduce the output
size)
```rust extern crate dpr;
fn main() -> Result<(), dpr::Error> { let result = dpr::Peg::new( " main = char+ char = 'a' -> A / 'b' -> B / . ", ) .gen_rules()? .parse("acb")? // .replace()? // ... ;
println!("{:#?}", result);
Ok(())
} ```
Now you can see de produced AST
text
Rule(
(
"main",
[
Rule(
(
"char",
[
Transf2(
(
"A",
[
Val(
"a",
),
],
),
),
],
),
),
Rule(
(
"char",
[
Val(
"c",
),
],
),
),
Rule(
(
"char",
[
Transf2(
(
"B",
[
Val(
"b",
),
],
),
),
],
),
),
],
),
)
And running the transformations...
```rust extern crate dpr;
fn main() -> Result<(), dpr::Error> { let result = dpr::Peg::new( " main = char+ char = 'a' -> A / 'b' -> B / . ", ) .gen_rules()? .parse("acb")? .replace()? // ... ;
println!("{:#?}", result);
Ok(())
} ```
txt
"AcB"