MenhirKV

MenhirKV is yet another KV store based on RocksDB and implemented in Rust.

In short this library simply allows you to store, locally, pairs of key-values, provided the data is serializable. It also guarantees entries will expire, at some point, so that disk space usage remains under control. Store your data, never worry about space. Only uninteresting data you never access will automatically disappear.

Most low-level key-value store offer a &Vec<u8>, &Vec<u8> or similar interface, which is the right thing to do, let the user of the library figure out the (de)serialization details. MenhirKV figures out those details and makes a few opinionated choices, namely:

In practice, MenhirKV offers:

But really, nothing new under the Sun, it is just only Rust + RocksDB + Message Pack + Bloom filter mashed together.

MenhirKV icon

Status

For now this is a toy project, clearly NOT suitable for production use.

Build Status

Usage

```rust use menhirkv::Store;

// Example with a key: usize, value: usize store, // feel free to use your own types, obviously. let store: Store = Store::opentemporary(100).unwrap(); store.put(&123, &456).unwrap(); asserteq!(Some(456), store.get(&123).unwrap()); ```

About capacity

This is possibly the most unusual and controversial choice make by MenhirKV so let's dive a bit deeper into it.

You set a capacity which is a LOW limit.

It is a mandatory parameter, typically a basic store opening requires two parameters: path + capacity.

To give an example, if you set a capacity of 10k (ten thousands) then you have the guarantee (*) than you'll have those 10k entries, stored, and no expiration. You may end up with up to 50k or maybe even 100k entries stored on disk. But at some point, depending on how RocksDB runs its internal compactions, and how the Bloom filter behaves, both of which are unpredictable, the data will be filtered, compacted, and the "old" keys removed.

What "old" means refers to the last time the key was accessed. The entry may have been the first one ever written to the database, if it keeps being accessed, either read or write, it remains on top of the list of keys to preserve.

LRU caches do this in a very predictable manner, but they are costly to maintain, especially when it comes to persistent store. I made a toy project (**) around this, and can tell it does not perform well. Most of the time the fuzzy strategy described above is good enough. It ensures 2 things:

The implementation detail trick that makes it efficient is that by using a hooked custom compaction filter the cost of expiring the unused entries is close to zero. Those bits of data would have been processed by RocksDB anyway. What MenhirKV does is only give a hint to RocksDB, at the very moment it tries to figure out how to compact the data and reorganize it on disk -> "oh well, you know what, we don't need this, just drop it on the floor".

(*) well, almost, in edge cases, the number of kept entries may go a bit below the planned capacity. This is because of Bloom filter implementation and usage details, but statistically, the store keeps more entries than the requested capacity. Think of this capacity setting as a fuzzy limit. If you really need a precise number, MenhirKV is not for you.

(**) DiskLRU, a toy project experimenting about persistent LRU. Working on it helped me a lot while making decisions for MenhirKV.

Links

License

MenhirKV is licensed under the MIT license.