Rust and mmap
The library is published on crates.io as mmapjsonfile and can help with counting and filtering json files with records all of which as symmetric in their structure ( json array of objects ) as the format below.
[{..}, {..}, ..]
The idea of using memory mapped i/o is check the performance while filtering and creating another file etc from rust while putting serde in harmsway :grin:.
Here is the best read on the topic from Linux forums.
Functionality
- Count the number of records in JSON file.
- Count the number of records with filter
- Filter the JSON file with a condition ( provided by the caller ) and save it to a file specified.
- Distinct values of a key.
Test
The airports JSON has been taken from the location. It needs to be downloaded and put in the data/ directory.
Thanks to the original mmap lib.
General performance seems to be of the order below.
( all tests below are run in a macbook )
cargo test --release -- --nocapture --test-threads 1
Debug
- counttestsimplenestedjson: 256.37µs seconds for counting 1 records
- counttestsimplewithfilter_json: 322.471µs seconds.
- counttestsimplenestedwithfilterjson: 396.664µs seconds
- counttestsimple_json: 258.594µs seconds for counting 1 records
- filteroutjsonnoresults: 675.752µs seconds.
- filteroutjsonbyvalue: 785.313µs seconds.
- count_airports: 690.802302ms seconds.
- countwithfilter_airports: 3.913697422s seconds for filtering 57265 records
- filteroutairportsincountry 3.91415908s: seconds for filtering 57265 records
- filteroutairportsnoresults: 3.922528546s seconds for filtering 57265 records
Release (1.1 GB - appended 16MB json multiple times - 3,355,711 records)
- countairports ... countairports: 1.153728577s seconds.
- counttestsimplejson ... counttestsimplejson: 105.415µs seconds for counting 1 records
- counttestsimplenestedjson ... counttestsimplenestedjson: 137.288µs seconds for counting 1 records
- counttestsimplenestedwithfilterjson ... counttestsimplenestedwithfilterjson: 156.865µs seconds
- counttestsimplewithfilterjson ... counttestsimplewithfilterjson: 85.541µs seconds.
- filteroutjsonbyvalue ... filteroutjsonbyvalue: 697.84µs seconds.
- filteroutjsonnoresults ... filteroutjsonnoresults: 380.902µs seconds.
- writedistinctfields ... writedistinctfields: 576.174µs seconds.
- testsumoverfield ... testsumoverfield: 104.42µs seconds.
- countwithfilterairports ... countwithfilterairports: 17.461620452s seconds.
- filteroutairportsincountry ... filteroutairportsincountry 17.580610223s: seconds.
- filteroutairportsnoresults ... filteroutairportsnoresults: 17.333596128s seconds.
- testsumoverfieldairportelevationft ... testsumover_field: 17.291048913s seconds.
- writedistinctfieldslargejson ... writedistinctfields: 22.316755059s seconds.
test result: ok. 14 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Doc-tests mmapjsonfile
running 5 tests
test src/lib.rs - count (line 294) ... ok
test src/lib.rs - countwithfilter (line 197) ... ok
test src/lib.rs - distinctoffield (line 371) ... ok
test src/lib.rs - filter (line 47) ... ok
test src/lib.rs - sumoverfield (line 508) ... ok
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out