Build value

membuffer

A rust library for rapid deserialization of huge datasets with few keys. The library is meant to be used with mmaped files, almost any crate on crates.io which does serialization and deserialization needs to process the whole structure. This makes it unusable with large memory mapped files. For this purpose this library only scans the header to get the schema of the datastructure and leaves all other fields untouched unless it is specifically asked to fetch them.

This data structure is optimized for deserialization, it does not parse the fields and therefore is extremely fast when deserializing big strings

Use cases: - Handling very big strings - Saving huge datasets with few keys on disk - For MMAPed Data structures as the fields does not get read until requested and therefore won't cause Page Faults

Examples

```rust use membuffer::{MemBufferWriter,MemBufferReader,MemBufferError};

fn main() { //Creates a new empty MemBufferWriter let mut writer = MemBufferWriter::new();

//Adds this as immutable field, no more changing after adding it writer.addstringentry("shortkey","shortvalue");

//Creates a Vec out of all the collected data let result = writer.finalize();

//Try to read the created vector. Will return an error if the CRC32 does not fit //or if the header is not terminated. Will panic if the memory is corrupted beyond recognition let reader = MemBufferReader::new(&result).unwrap();

//Will return an error if the selected key could not be found or if the value types dont match asserteq!(reader.getstringfield("shortkey").unwrap(), "short_value"); } ```

Example using serde in the data structure: ```rust

[derive(Serialize,Deserialize)]

struct HeavyStruct { vec: Vec, name: String, frequency: i32, id: i32, }

[test]

fn checkserdecapability() { let value = HeavyStruct { vec: vec![100,20,1], name: String::from("membuffer!"), frequency: 10, id: 200, }; let mut writer = MemBufferWriter::new(); writer.addserdeentry("heavy", &value); let result = writer.finalize();

let reader = MemBufferReader::new(&result).unwrap(); let struc: HeavyStruct = reader.getserdefield("heavy").unwrap();

asserteq!(struc.vec, vec![100,20,1]); asserteq!(struc.name,"membuffer!"); asserteq!(struc.frequency,10); asserteq!(struc.id,200); } ```

Benchmark

Benchmark Why is the library this fast? The benchmark consists of deserializing a data structure with different payload sizes either 1 MB, 10 MB or 100 MB. The membuffer load only the data structure layout and returns a slice to the strings instead of parsing the whole structure. This will help heaps if one uses MMAPed structures for example. As one can see in the benchmarks the speed of membuffer is only dependent on the number of keys and not of the size of the datastructure deserialized which is a good proof that the complexity of the deserialization does not depend on the size of the datastructure.

Benchmark code: ```rust

[bench]

fn benchmarkfewkeyspayload1mbtimes3(b: &mut Bencher) { let mut hugestring = String::withcapacity(10000000); for _ in 0..1000000 { hugestring.push('a'); } let mut writer = MemBufferWriter::new(); writer.addstringentry("one",&hugestring); writer.addstringentry("two",&hugestring); writer.addstringentry("three",&hugestring); let result = writer.finalize(); assert!(result.len() > 3000000);

b.iter(|| { let reader = MemBufferReader::new(&result).unwrap(); let string1 = reader.getstringfield("one").unwrap(); let string2 = reader.getstringfield("two").unwrap(); let string3 = reader.getstringfield("three").unwrap(); asserteq!(string1.len(), 1000000); asserteq!(string2.len(), 1000000); asserteq!(string3.len(), 1000_000); }); }

[derive(Serialize,Deserialize)]

struct BenchSerde<'a> { one: &'a str, two: &'a str, three: &'a str }

[bench]

fn benchmarkfewkeyspayload1mbtimes3serde(b: &mut Bencher) { let mut hugestring = String::withcapacity(1000000); for _ in 0..1000000 { hugestring.push('a'); } let first = BenchSerde { one: &hugestring, two: &hugestring, three: &huge_string };

let string = serdejson::tostring(&first).unwrap();

b.iter(|| { let reader: BenchSerde = serdejson::fromstr(&string).unwrap(); asserteq!(reader.one.len(), 1000000); asserteq!(reader.two.len(), 1000000); asserteq!(reader.three.len(), 1000_000); }); } ```