This is a graph-based binary format serializer/deserializer generator for use in build.rs!

This is for interfacing with existing, externally developed binary format specifications. If you want to automatically serialize/deserialize your own data, use Protobuf or Bincode or something similar instead of this.

By avoiding a struct macro based interface I think this is more flexible and simpler to use than alternatives.

Goals

Flexibility, supporting all the craziness people do with binary format specs

I'm not sure if it's possible to bound all things formats can do. This won't support everything from the get-go, but it's supposed to be structured in such a way that individual new features don't require major rewrites.
Easy to understand, composable elements

The API is simple and flexible by constructing a graph of reversible processing nodes.
Precision

Specifications are unambiguous, so binary representations with the same spec never change in future versions due to underspecification.
Safety
- Type, overlap, bounds checking
- Rust to Rust round-tripping is lossless

Non-goals

Convenient binary format development

If you want convenience just use ProtoBuf or Bincode or some other general purpose, battle-tested generator which handles low level details for you.

This is primarily for interfacing with existing, externally developed specifications.
Extreme optimization

This is rust, and I don't do crazy things, so it won't be slow. But optimization is lower priority than the above.

Features

Current features

Basic schema - primitive types, integers, arrays, enums
Serial bit fields
Alignment
Out of order/split deserialization
Custom types (serde, exotic string encodings)
Sync and async
✨Macro and generic free✨

Want features

Reading from/writing to the end of file (reverse direction)
Rust bitfields
Fixed-length arrays
Delimited arrays
Unwrap single-field objects

Not-in-the-short-run features

Zero-alloc reading/writing
Global byte alignment (alignment is relative to the current object, but alignments in an unaligned object will be unaligned)

Example

A minimal versioned data container.

build.rs:

```rust use std::{ path::PathBuf, env, str::FromStr, fs::{ self, }, }; use inarybay::scope::Endian; use quote::quote;

pub fn main() { println!("cargo:rerun-if-changed=build.rs"); let root = PathBuf::fromstr(&env::var("CARGOMANIFESTDIR").unwrap()).unwrap(); let schema = inarybay::schema::Schema::new(); { let scope = schema.scope("scope", inarybay::schema::GenerateConfig { read: true, write: true, ..Default::default() }); let version = scope.int("versionint", scope.fixedrange("versionbytes", 2), Endian::Big, false); let body = scope.remainingbytes("databytes"); let object = scope.object("obj", "Versioned"); { object.addtypeattrs(quote!(#[derive(Clone, Debug, PartialEq)])); object.field("version", version); object.field("body", body); } scope.rustroot(object); } fs::write(root.join("src/versioned.rs"), schema.generate().asbytes()).unwrap(); } ```

Then use it like:

main.rs

```rust use std::fs::File; use crate::versioned::versioned_read;

pub fn main() { let mut f = File::open("x.jpg.ver").unwrap(); let v = versioned_read(&mut f).unwrap(); println!("x.jpg.ver is {}", v.version); println!("x.jpg.ver length is {}", v.body.len()); } ```

Guide

Terminology: The serial-rust axis

The words "serial" and "rust" are scattered here and there.

Deserialization takes data from the "serial" side and moves towards the "rust" side, ending at the "rust root" which is the object to deserialize.

Serialization takes "rust" side data (the object) and transforms it through multiple nodes until it reaches the "serial root" - the file, or socket, or whatever.

Each link between nodes has a "serial" and a "rust" side.

Setup

To use Inarybay you define a schema in build.rs, which generates normal rust code.

You need two dependencies:

cargo add --build inarybay cargo add inarybay-runtime

The latter has a few helper types and republished dependencies.

Defining a schema

Some of the descriptions here are deserialization-oriented, but everything is naturally bidirectional, it's just easier to describe with a reference direction.

Create a schema let schema = Schema::new()
Create a root scope let scope = schema.scope(...)
Define nodes on scope like scope.fixed_range, scope.int, etc., connecting them as you go
Define rust root using scope.rust_root
Generate the code with schema.generate and write it to the file you want

A note on arguments

Node - all Node* types have an .into() which will convert them to Node. Node is basically a big enum of all the node types.
id - these are used for variable names in the generated de/serialize code, as well as uniquely identifying nodes for error messages and loop-identification during graph traversal
TokenStream - if an argument has this type, it means it wants some code that will be injected into the generated code. You can generate it with quote!() or quote!{} (equivalent, use whichever bracket you prefer) from the quote crate. The code could be something as simple as a type (like quote!(my::special::Type)), an expression (quote!(#source * 33)), or multiple statements, depending on what the function requires.

Troubleshooting

Error line numbers

The build script uses panics to communicate build errors. By default in build.rs these are missing line numbers - to get line numbers do

CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true RUST_BACKTRACE=1 cargo build
Mismatched types

When specifying constants using quote! using a variable, like quote!(#i), quote! appends the literal type suffix. You may need to cast to the right number type before passing it to quote! to match the data type of whatever data it's being used with.

It may be possible to make this more type safe in a future update.

Design and implementation

De/serialization is done as a dependency graph traversal - all dependencies are walked before any dependents.

For serialization, the root of the graph is the file/socket/whatever - it depends on each serial segment, which depends on individual data serializations from fields. Later segments depend on earlier segments, so the earlier segments get written first.

For deserialization, the root of the graph is the root rust type, like a rust struct - the root depends on each field to be read, which depends on data transformations, which depend on segments being read. Again, like for serialization, each serial segment depends on the previous segment so that segment is read before the next.

Nesting

Nested objects, like arrays and enums, are treated as single nodes with their own graphs internally. I feel like it should be possible to unify this in a single graph but I haven't come up with a way to do it yet.

Writing and memory use

Right now, each node allocates memory for its output when writing. I would like to write directly to the output stream without allocating extra buffers, but at the moment that would make the following scenario difficult. Consider the graph:

Serial enum tag
Serial X
Serial Y
Serial enum
- Variant
- Serial integer Z
- Custom node, uses Z and X

The custom node does some complex processing during transformation.

During serialization, the variant node produces three outputs: the enum tag, the enum, and X. Y needs to be written between X and the enum, so there are two options:

Serialize to variables, then write the variables in order
Descend into the enum multiple times

2 would allow direct writing to the stream with no temporary buffers, but both X and Z depend on the custom node, so each descent would need to redo the custom node's transformations, or else somehow per-variant temporary values would need to be stored between descents into the enum.

It may be possible to do that, but it would be significantly more complicated and I don't think the memory use is excessive at the moment.