JSON parser which picks up values directly without performing tokenization in Rust
Pikkr is a JSON parser which picks up values directly without performing tokenization in Rust. This JSON parser is implemented based on Y. Li, N. R. Katsipoulakis, B. Chandramouli, J. Goldstein, and D. Kossmann. Mison: a fast JSON parser for data analytics. In VLDB, 2017.
This JSON parser extracts values from a JSON record without using finite state machines (FSMs) and performing tokenization. It parses JSON records in the following procedures:
This JSON parser performs well when there are a limited number of different JSON structural variants in a JSON data stream or JSON collection, and that is a common case in data analytics field.
Please read the paper mentioned in the opening paragraph for the details of the JSON parsing algorithm.
Model Name: MacBook Pro
Processor Name: Intel Core i7
Processor Speed: 3.3 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 4 MB
Memory: 16 GB
```rust extern crate pikkr;
fn main() { let queries = vec![ "$.f1".asbytes(), "$.f2.f1".asbytes(), ]; let trainnum = 2; // Number of records used as training data // before Pikkr starts speculative parsing. let mut p = pikkr::Pikkr::new(&queries, trainnum); let recs = vec![ r#"{"f1": "a", "f2": {"f1": 1, "f2": true}}"#, r#"{"f1": "b", "f2": {"f1": 2, "f2": true}}"#, r#"{"f1": "c", "f2": {"f1": 3, "f2": true}}"#, // Speculative parsing starts from this record. r#"{"f2": {"f2": true, "f1": 4}, "f1": "d"}"#, r#"{"f2": {"f2": true, "f1": 5}}"#, r#"{"f1": "e"}"# ]; for rec in recs { let results = p.parse(rec.asbytes()); for result in results { print!("{} ", match result { Some(result) => String::fromutf8(result.to_vec()).unwrap(), None => String::from("None"), }); } println!(); } } ```
bash
$ cargo --version
cargo 0.22.0-nightly (3d3f2c05d 2017-08-27) # Make sure that nightly release is being used.
$ RUSTFLAGS="-C target-cpu=native" cargo build --release
bash
$ ./target/release/[package name]
"a" 1
"b" 2
"c" 3
"d" 4
None 5
"e" None