Block Array Cow
In memory array de-duplication, useful for efficiently storing many versions of data.
This is suitable for storing undo history for example - where the size of a struct can be used as the stride, and is effective with both binary and text data.
The code is Apache2.0 licensed and doesn't have any dependencies.
For an undo system (or any other history storage) you may want to store many versions of your data.
In some cases it makes sense to write a
persistent data structure <https://en.wikipedia.org/wiki/Persistent_data_structure>
__
but this depends a lot on the kind of data you're dealing with.
In other cases its nice to have the convenience of being able to serialize your data and store it without worrying about the details of how duplication is managed.
That's the motivation for writing this library.
BArrayStore
is created with a fixed stride and block size.If a mismatch is found, the reference blocks use a lazily initialized hash of their first N bytes. A hash data for the data being added with a value for each stride offset is calculated too.
Looping over the newly added state data can now perform hash look-ups on the reference chunks and then a full comparison if a match is found.
In that case the following chunks are tested to see if they match (to avoid further lookups), otherwise a new chunk is allocated.
Where N is currently the stride * 7
, see: BCHUNK_HASH_TABLE_ACCUMULATE_STEPS
.
.. note::
This has a slight emphasis on performance, since this method is used in an 3D modelers undo system. Where changes accumulate and are freed at run-time.
The use of fixed sized chunks is better suited to in-memory data storage, compared to a command line utility for creating a one-off binary diff for example - where more exhaustive tests may be preferred.
In general operations that would use excessive calculation are avoided, since there are many possible changes that would improve memory usage at the cost of performance.
Detecting numeric changes to the data (values incremented/decremented, zeroed etc... are not detected).
Blocks either match exactly or not at all.
Some things that may be worth considering.
BCHUNK_HASH_TABLE_ACCUMULATE_STEPS
configurable,
to control how much of each blocks data contributes to its hash.mmap
for data storage.Crates.io <https://crates.io/crates/block-array-cow>
__.API docs <https://docs.rs/block-array-cow>
__.