`borsh`

Binary Object Representation Serializer for Hashing

Website | Example | Features | Benchmarks | Specification

Why do we need yet another serialization format? Borsh is the first serializer that prioritizes the following qualities that are crucial for high-security projects: * Consistent and specified binary representation: * Consistent means there is a bijective mapping between objects and their binary representations. There is no two binary representations that deserialize into the same object. This is extremely useful for applications that use binary representation to compute hash; * Borsh comes with a full specification that can be used for implementations in other languages; * Safe. Borsh implementations use safe coding practices. In Rust, Borsh uses only safe code; * Speed. In Rust, Borsh achieves high performance by opting out from Serde which makes it faster than bincode; which also reduces the code size.

Example

[derive(BorshSerialize, BorshDeserialize, PartialEq, Debug)]

[test]

fn testsimplestruct() { let a = A { x: 3301, y: "liber primus".tostring(), }; let encodeda = a.trytovec().unwrap(); let decodeda = A::tryfromslice(&encodeda).unwrap(); asserteq!(a, decodeda); } ```

Features

Opting out from Serde allows borsh to have some features that currently are not available for serde-compatible serializers. Currently we support two features: borsh_init and borsh_for (the former one not available in Serde).

borsh_init allows to automatically run an initialization function right after deserialization. This adds a lot of convenience for objects that are architectured to be used as strictly immutable. Usage example: ```rust

[derive(BorshSerialize, BorshDeserialize)]

[borsh_init(init)]

struct Message { message: String, timestamp: u64, public_key: CryptoKey, signature: CryptoSignature hash: CryptoHash }

impl Message { pub fn init(&mut self) { self.hash = CryptoHash::new().writestring(self.message).writeu64(self.timestamp); self.signature.verify(self.hash, self.public_key); } } ```

borsh_skip allows to skip serializing/deserializing fields, assuming they implement Default trait, similary to #[serde(skip)]. ```rust

[derive(BorshSerialize, BorshDeserialize)]

Benchmarks

We measured the following benchmarks on objects that blockchain projects care about the most: blocks, block headers, transactions, accounts. We took object structure from the nearprotocol blockchain. The benchmarks were run on Google Cloud n1-highmem-16 (16 vCPUs, 104 GB memory), with Intel(R) Xeon(R) CPU @ 2.20GHz, 56320 KB cache processors. Using one core for the actual benchmark execution. Version used for benchmarks: 0.2.0.

``` test seraccountcbor ... bench: 536 ns/iter (+/- 8) test seraccountbincode ... bench: 150 ns/iter (+/- 7) test seraccountborsh ... bench: 42 ns/iter (+/- 4) test seraccountspeedy ... bench: 40 ns/iter (+/- 7)

test sertransactioncbor ... bench: 35,374 ns/iter (+/- 815) test sertransactionbincode ... bench: 26,749 ns/iter (+/- 1,375) test sertransactionborsh ... bench: 14,160 ns/iter (+/- 614) test sertransactionspeedy ... bench: 840 ns/iter (+/- 46)

test serblockheadercbor ... bench: 211,129 ns/iter (+/- 4,477) test serblockheaderbincode ... bench: 186,559 ns/iter (+/- 14,868) test serblockheaderborsh ... bench: 26,196 ns/iter (+/- 1,224) test serblockheaderspeedy ... bench: 25,540 ns/iter (+/- 2,172)

test serblockcbor ... bench: 31,438,399 ns/iter (+/- 4,456,689) test serblockbincode ... bench: 22,405,977 ns/iter (+/- 767,936) test serblockborsh ... bench: 12,722,433 ns/iter (+/- 1,067,208) test serblockspeedy ... bench: 767,713 ns/iter (+/- 32,926)

test deaccountcbor ... bench: 649 ns/iter (+/- 21) test deaccountbincode ... bench: 110 ns/iter (+/- 2) test deaccountborsh ... bench: 46 ns/iter (+/- 5) test deaccountspeedy ... bench: 12 ns/iter (+/- 0)

test detransactionbincode ... bench: 13,581 ns/iter (+/- 574) test detransactioncbor ... bench: 18,910 ns/iter (+/- 704) test detransactionborsh ... bench: 29,698 ns/iter (+/- 1,370) test detransactionspeedy ... bench: 1,249 ns/iter (+/- 57)

test deblockheadercbor ... bench: 647,718 ns/iter (+/- 32,769) test deblockheaderbincode ... bench: 182,284 ns/iter (+/- 14,020) test deblockheaderborsh ... bench: 91,914 ns/iter (+/- 16,850) test deblockheaderspeedy ... bench: 84,948 ns/iter (+/- 14,968)

test deblockcbor ... bench: 40,483,706 ns/iter (+/- 2,271,670) test deblockbincode ... bench: 10,804,396 ns/iter (+/- 407,032) test deblockborsh ... bench: 27,766,896 ns/iter (+/- 2,318,010) test deblockspeedy ... bench: 2,199,706 ns/iter (+/- 649,436) ```

Specification

In short, Borsh is a non self-describing binary serialization format. It is designed to serialize any objects to canonical and deterministic set of bytes.

General principles: * integers are little endian; * sizes of dynamic containers are written before values as u32; * all unordered containers (hashmap/hashset) are ordered in lexicographic order by key (in tie breaker case on value); * structs are serialized in the order of fields in the struct; * enums are serialized with using u8 for the enum ordinal and then storing data inside the enum value (if present).

Informal type	Rust EBNF *	Pseudocode
Integers	integer_type: ["u8" \| "u16" \| "u32" \| "u64" \| "u128" \| "i8" \| "i16" \| "i32" \| "i64" \| "i128" ]	little_endian(x)
Floats	float_type: ["f32" \| "f64" ]	err_if_nan(x) little_endian(x as integer_type)
Unit	unit_type: "()"	We do not write anything
Fixed sized arrays	array_type: '[' ident ';' literal ']'	for el in x repr(el as ident)
Dynamic sized array	vec_type: "Vec<" ident '>'	repr(len() as u32) for el in x repr(el as ident)
Struct	struct_type: "struct" ident fields	repr(fields)
Fields	fields: [named_fields \| unnamed_fields]
Named fields	named_fields: '{' ident_field0 ':' ident_type0 ',' ident_field1 ':' ident_type1 ',' ... '}'	repr(ident_field0 as ident_type0) repr(ident_field1 as ident_type1) ...
Unnamed fields	unnamed_fields: '(' ident_type0 ',' ident_type1 ',' ... ')'	repr(x.0 as type0) repr(x.1 as type1) ...
Enum	enum: 'enum' ident '{' variant0 ',' variant1 ',' ... '}' variant: ident [ fields ] ?	Suppose X is the number of the variant that the enum takes. repr(X as u8) repr(x.X as fieldsX)
HashMap	hashmap: "HashMap<" ident0, ident1 ">"	repr(x.len() as u32) for (k, v) in x.sorted_by_key() { repr(k as ident0) repr(v as ident1) }
HashSet	hashset: "HashSet<" ident ">"	repr(x.len() as u32) for el in x.sorted() { repr(el as ident) }
Option	option_type: "Option<" ident '>'	if x.is_some() { repr(1 as u8) repr(x.unwrap() as ident) } else { repr(0 as u8) }
String	string_type: "String"	encoded = utf8_encoding(x) as Vec<u8> repr(encoded.len() as u32) repr(encoded as Vec<u8>)

Note: * Some parts of Rust grammar are not yet formalized, like enums and variants. We backwards derive EBNF forms of Rust grammar from syn types; * We had to extend repetitions of EBNF and instead of defining them as [ ident_field ':' ident_type ',' ] * we define them as ident_field0 ':' ident_type0 ',' ident_field1 ':' ident_type1 ',' ... so that we can refer to individual elements in the pseudocode; * We use repr() function to denote that we are writing the representation of the given element into an imaginary buffer.