Peapod

Peapod is a data structure for storing enums super-compactly, like peas in a pod. It works with any enum that implements the Phenotype trait, which captures the behaviour of each variant.

Contents

  1. Motivation
  2. Usage
  3. Technical
  4. How Peapod works
  5. When not to use Peapod

Motivation

We only have so much memory to work with. Especially in space-constrained systems, we want to be particularly efficient. Peapod provides a way of storing enums that can dramatically reduce space usage. You can read more in-depth about the motivation in technical section.

tl;dr: Peapod provides ultra-compact storage for enums!

Usage

You can basically just use Peapod like a normal Vec. Some functionality is impossible though, like iterating over a Peapod without consuming it.

To make an enum suitable for Peapod storage, stick a #[derive(Phenotype)] on it.

```rust use peapod::{Phenotype, Peapod};

fn main() { // The Peapod representation is a lot smaller! // These numbers are in bytes asserteq!(ILovePeas::PEAPODSIZE.unwrap(), 9); asserteq!(std::mem::sizeof::(), 16);

let mut pp = Peapod::new();
pp.push(ILovePeas::SnowPea);
pp.push(ILovePeas::Edamame(0x9EA90D));
pp.push(ILovePeas::GeneticPea {
    wrinkled: true,
    yellow: true,
});

for pea in pp {
    // do something with pea!
}

}

[derive(Phenotype)] // <- this is where the magic happens

enum ILovePeas { Edamame(usize), SnowPea, GeneticPea { wrinkled: bool, yellow: bool }, } ```

Technical

enums (also known as tagged unions) are represented in memory by a tag (integer) and a union. The tag specifies how the bits of the union are interpreted. For example, a tag of 0 might mean "read the union as Result::Ok(_)", while a tag of 1 would mean "read the union as Result::Err(_)".

Because of alignment reasons, the compiler has to lay out enums so that the tag takes up a more space than need be. If there are only two variants, we only need one bit to keep track of which variant something is. Take this pretty drastic example:

rust enum Two { First(usize), Second(usize) } // mem::size_of::<Two> == 16

Since the size of each variant is 8 bytes, and the size of the enum is 16 bytes, 8 bytes are being used for the tag! 63 bits are being wasted! We can do better.

Peapod works by "cleaving" an enum into tag and union. Tags are stored together in a bitvec type so that no space is wasted due to alignment. All the data from the enums (in union form) is also stored together.

This drawing illustrates the previous example:

``` Scale: 1 - == 1 byte

Standard: +--------+--------+ | tag | data | +--------+--------+ ^ Only this byte is actually needed to store the tag

Standard array: +--------+--------+--------+--------+--------+--------+ | tag | data | tag | data | tag | data | . . . +--------+--------+--------+--------+--------+--------+

Peapod: +-+--------+ | | data | +-+--------+ ^ tag

Peapod array: +-+ +--------+--------+--------+ | | + | data | data | data | . . . +-+ +--------+--------+--------+ ^ many tags can be packed into one byte, we could hold 5 more tags in this byte ```

How does it do it?

Preface: compiler people I beg your forgiveness.

The magic is in the Phenotype trait, which has two very important methods: cleave and reknit.

rust type Value; fn cleave(self) -> (usize, Self::Value) fn reknit(tag: usize, value: Self::Value) -> Self

The type Value is some type that can hold all the data from each enum variant. It should be a union.

cleave tags a concrete instance of an enum and splits it into a tag (this tag is internal to Phenotype, unrelated to the compiler's) and a Self::Value. reknit does the opposite and takes a tag and a Self::Value, and reconstitutes it into an enum variant.

The implementation all happens with the wizardry that is proc-macros. #[derive(Phenotype)] is the workhorse of this project.

The #[derive(Phenotype)] takes a look at your enum and first generates some "auxilliary" types like so:

```rust enum ThreeTypes { NamedFields { one: T, two: usize }, Tuple(usize, usize), Empty }

// Represents the NamedFields variant struct __PhenotypeInternalThreeTypesNamedFieldsData { one: T, two: usize, }

// Represents the Tuple variant struct __PhenotypeInternalThreeTypesTupleData(usize, usize);

[allow(nonsnakecase)]

union __PhenotypeInternalThreeTypesData { NamedFields: ManuallyDrop<__PhenotypeInternalThreeTypesNamedFieldsData>, Tuple: ManuallyDrop<__PhenotypeInternalThreeTypesTupleData>, Empty: (), } ```

Then, it generates the cleave method. The generated code for this example looks like:

```rust fn cleave(self) -> (usize, Self::Value) { match &*ManuallyDrop::new(self) { ThreeTypes::Empty => (2usize, PhenotypeInternalThreeTypesData { Empty: () }), ThreeTypes::Tuple(0, _1) => ( 1usize, _PhenotypeInternalThreeTypesData { Tuple: ManuallyDrop::new(PhenotypeInternalThreeTypesTupleData( unsafe { ::core::ptr::read(0) }, unsafe { ::core::ptr::read(1) }, )), }, ), ThreeTypes::NamedFields { one, two } => ( 0usize, PhenotypeInternalThreeTypesData { NamedFields: ManuallyDrop::new(PhenotypeInternalThreeTypesNamedFieldsData::< T,

{ one: unsafe { ::core::ptr::read(one) }, two: unsafe { ::core::ptr::read(two) }, }), }, ), } } ```

All we're doing is matching on the enum variant and reading out each field into the correct auxiliary struct.

cleave does the opposite. Based on the tag, it reads the union and generates and enum variant from the data contained in the auxiliary struct.

rust fn reknit(tag: usize, value: Self::Value) -> ThreeTypes<T> { match tag { 2usize => ThreeTypes::Empty, 1usize => { let data = ManuallyDrop::<__PhenotypeInternalThreeTypesTupleData>::into_inner(unsafe { value.Tuple }); ThreeTypes::Tuple(data.0, data.1) } 0usize => { let data = ManuallyDrop::<__PhenotypeInternalThreeTypesNamedFieldsData<T>>::into_inner( unsafe { value.NamedFields }, ); ThreeTypes::NamedFields { one: data.one, two: data.two, } } _ => unreachable!(), } }

When not to use Peapod

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.