Parser Compose

⚠️ Warning ☣️
Homemade, hand-rolled code ahead. Experimental. May not function as advertised.

Documentation

Examples

parser-compose is a library for writing and composing parsers for arbitrary file or data formats. It has a strong focus on usability and API design.

It's based on the ideas around parser combinators and parsing expression grammars, but don't let those terms scare you off.

I made this because many of my projects involve parsing something (like a configuration file, a binary formt, an HTTP message), but the field of parsing and language theory sounds incredibly dull to me. I have always resorted to "string.split()"-type parsing, but it is tedious.

Turns out there is a better way! No theory required.

Crash course in parser combinators

Say you want to extract the letter 'a' from a sequence of bytes. You can write a parser function takes the bytes as argument and returns successfully if it saw the byte 97 (ascii for 'a') at the start:

Note: I am not handling edge cases to keep things brief

``rust // If we find97` successfully at the start of sequence, extract it and put it // in a tuple. Return the remaining input as well fn match_a(input: &[u8]) -> Result<(u8, &[u8]), String> { if input[0] == 97 { Ok((input[0], &input[1..])) } else { Err(format!("could not find 97")) } }

fn main() { let msg = &b"abc"[..]; let (value, remaining) = match_a(&msg).unwrap(); println!("{value}"); // -> 97 println!("{remaining:?}"); // -> [98, 99] } ```

Ok, but what if you wanted to parse 98 now? You can write a function that builds a parser. The argument to this parser builder will be the byte you'd like to recognize. The parser builder will return a parser that accepts that byte.

```rust fn match_u8(expected: u8) -> impl Fn(&[u8]) -> Result<(u8, &[u8]), String> { move |input: &[u8]| { if input[0] == expected { Ok((input[0], &input[1..])) } else { Err(format!("could not find {expected}")) } } }

fn main() { let msg = &b"abc"[..]; let (value, remaining) = matchu8(b'a')(&msg).unwrap(); println!("{value}"); // -> 97 println!("{remaining:?}"); // -> [98, 99] // Note how we parse remaining instead of msg here. // This is how you "move" through the input let (value, remaining) = matchu8(b'b')(&remaining).unwrap(); println!("{value}"); // -> 98 println!("{remaining:?}"); // -> [99] } ```

For the final touch. What if you wanted to recognize 97 or 98? We can write a function that ... uhh ... combines (hint, hint, wink, wink) two parsers it gets as arguments. When this combiner function is called, it returns a parser that succeeds with the value of the first succeeding inner parser.

```rust // referred to as a "parser combinator" fn or(parser1: P1, parser2: P2) -> impl Fn(&[u8]) -> Result<(u8, &[u8]), String> where P1: Fn(&[u8]) -> Result<(u8, &[u8]), String>, P2: Fn(&[u8]) -> Result<(u8, &[u8]), String> { move |input: &[u8]| { match parser1(input) { Ok((value, rest)) => Ok((value, rest)), Err(_) => match parser2(input) { Ok((value, rest)) => Ok((value, rest)), Err(e) => Err(e) } } } }

fn match_u8(expected: u8) -> impl Fn(&[u8]) -> Result<(u8, &[u8]), String> { move |input: &[u8]| { if input[0] == expected { Ok((input[0], &input[1..])) } else { Err(format!("could not find {expected}")) } } }

fn main() { let msg = &b"abc"[..]; let (value, remaining) = or( matchu8(b'a'), matchu8(b'b') )(&msg).unwrap();

println!("{value}");
// -> 97
println!("{remaining:?}");
// -> [98, 99]
let (value, remaining) = or(
  match_u8(b'a'),
  match_u8(b'b')
)(&msg).unwrap();
println!("{value}");
// -> 98
println!("{remaining:?}");
// -> [99]

} ```

That is the basic idea. You can now go crazy writing all sorts of combinators like and() and optional(), and use them to combine your parsers together ... or you could use this crate :)

Similar projects

Thanks

This crate would not have been possible without: