OFilter

OFilter is an opinionated Bloom filter implemented in Rust.

It implements:

The basic Bloom filter is inspired from the existing and stable bloomfilter crate but does not directly depend on it. The API is slightly different, and makes a few opinionated changes.

In practice, I am using this filter to implement a self-expiring KV store.

OFilter icon

Status

For now this is a toy project, clearly NOT suitable for production use.

Build Status Crates.io Gitlab License

https://en.wikipedia.org/wiki/Bloom_filter

Usage

```rust use ofilter::Bloom;

let mut filter: Bloom = Filter::new(100); assert!(!filter.check(&42)); filter.set(&42); assert!(filter.check(&42)); ```

A bit of theory

This interactive Bloom filter params calulator proved very useful while developping this.

Also it recaps the usual formulas:

``` With: n: number of items in the filter p: probability of false positives, fraction between 0 and 1 m: number of bits in the filter k: number of hash functions

We have: n = ceil(m / (-k / log(1 - exp(log(p) / k)))) p = pow(1 - exp(-k / (m / n)), k) m = ceil((n * log(p)) / log(1 / pow(2, log(2)))); k = round((m / n) * log(2)); ```

A command line equivalent would be this practical bloom-filter-calculator.rb gist

```ruby

Optimal bloom filter size and number of hashes

Tips:

1. One byte per item in the input set gives about a 2% false positive rate.

2. The optimal number of hash functions is ~0.7x the number of bits per item.

3. The number of hashes dominates performance.

Expected number of items in the collection

n = (m * ln(2))/k;

n = 300_000

Acceptable false-positive rate (0.01 = 1%)

p = e^(-(m/n) * (ln(2)^2));

fpr = 0.01

Optimal size (number of elements in the bit array)

m = -((n*ln(p))/(ln(2)^2));

m = (n * Math.log(fpr).abs) / (Math.log(2) ** 2)

Optimal number of hash functions

k = (m/n) * ln(2);

k = (m / n) * Math.log(2)

puts "Optimal bloom filter size: #{m.ceil} bits" puts "Optimal number of hash functions: #{k.ceil}" ```

Links

License

OFilter is licensed under the MIT license.