⚠️ Warning ☣️
Homemade, hand-rolled code ahead. Experimental. May not function as advertised.
parser-compose
is a library for writing and composing parsers for arbitrary
file or data formats. It has a strong focus on usability and API design.
It's based on the ideas around parser combinators and parsing expression grammars, but don't let those terms scare you off.
I made this because many of my projects involve parsing something (like a configuration file, a binary formt, an HTTP message), but the field of parsing and language theory sounds incredibly dull to me. I have always resorted to "string.split()"-type parsing, but it is tedious.
Turns out there is a better way! No theory required.
Say you want to extract the letter 'a' from a sequence of bytes. You can write a
parser function takes the bytes as argument and returns successfully if it saw
the byte 97
(ascii for 'a') at the start:
Note: I am not handling edge cases to keep things brief
``rust
// If we find
97` successfully at the start of sequence, extract it and put it
// in a tuple. Return the remaining input as well
fn match_a(input: &[u8]) -> Result<(u8, &[u8]), String> {
if input[0] == 97 {
Ok((input[0], &input[1..]))
} else {
Err(format!("could not find 97"))
}
}
fn main() { let msg = &b"abc"[..]; let (value, remaining) = match_a(&msg).unwrap(); println!("{value}"); // -> 97 println!("{remaining:?}"); // -> [98, 99] } ```
Ok, but what if you wanted to parse 98
now? You can write a function that
builds a parser. The argument to this parser builder will be the byte you'd
like to recognize. The parser builder will return a parser that accepts that byte.
```rust fn match_u8(expected: u8) -> impl Fn(&[u8]) -> Result<(u8, &[u8]), String> { move |input: &[u8]| { if input[0] == expected { Ok((input[0], &input[1..])) } else { Err(format!("could not find {expected}")) } } }
fn main() {
let msg = &b"abc"[..];
let (value, remaining) = matchu8(b'a')(&msg).unwrap();
println!("{value}");
// -> 97
println!("{remaining:?}");
// -> [98, 99]
// Note how we parse remaining
instead of msg
here.
// This is how you "move" through the input
let (value, remaining) = matchu8(b'b')(&remaining).unwrap();
println!("{value}");
// -> 98
println!("{remaining:?}");
// -> [99]
}
```
For the final touch. What if you wanted to recognize 97
or 98
? We
can write a function that ... uhh ... combines (hint, hint, wink, wink) two
parsers it gets as arguments. When this combiner function is called, it returns
a parser that succeeds with the value of the first succeeding inner
parser.
```rust
// referred to as a "parser combinator"
fn or
fn match_u8(expected: u8) -> impl Fn(&[u8]) -> Result<(u8, &[u8]), String> { move |input: &[u8]| { if input[0] == expected { Ok((input[0], &input[1..])) } else { Err(format!("could not find {expected}")) } } }
fn main() { let msg = &b"abc"[..]; let (value, remaining) = or( matchu8(b'a'), matchu8(b'b') )(&msg).unwrap();
println!("{value}");
// -> 97
println!("{remaining:?}");
// -> [98, 99]
let (value, remaining) = or(
match_u8(b'a'),
match_u8(b'b')
)(&msg).unwrap();
println!("{value}");
// -> 98
println!("{remaining:?}");
// -> [99]
} ```
That is the basic idea. You can now go crazy writing all sorts of combinators
like and()
and optional()
, and use them to combine your parsers
together ... or you could use this crate :)
This crate would not have been possible without:
pom
, which lays out the various approaches to writing parser
combinators in rust.