Bloomberg

A scalable bloom filter written in Rust for flexible, space-efficient lookup tables.

Bloom filters in general are bitsets that achieve very high storage savings by accepting negligible, tunable expected false positive rates while admitting absolutely no false negatives. This is extremely convenient in situations where you can tolerate some error but must absolutely handle cases of nonexistence. One example of this is an IP blacklist filter that is emergency deployed during a DDoS attack; it is acceptable to block some whitelisted IPs but absolutely no blacklisted ones should be allowed access.

Unfortunately, bloom filters have a fixed size and it's not always possible to anticipate in advance what kind of size requirements are necessary. A scalable bloom filter overcomes this limitation by adding a new filter every time the last one gets saturated, with a larger size and a smaller probability of false positives (which preserves the boundedness of the error).

For more information, see: http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

Brief description from the literature (symbols are borrowed from the same source): - A SBF starts with one slice with k0 slices and error probability P0. - If this filter gets filled, a new one is added with k1 slices and P1 = r * P0 error (which always decreases). - We choose k0 = log2(P0^-1) and ki = log2(Pi^-1) = k0 + ilog2(r^-1). - We also select r = .8 based on the literature. - Interestingly, the size of each generation of the filter can be chosen orthogonally of the false positive probability. - Specifically, the filters should increase in size by some ratio s, e.g. 2, or 4 (other values are not recommended). - The literature notes briefly that the error has an upper bound of P <= P0 * (1 / (1-r)), so P0 >= P(1-r) = .2P. - Bloom filters in general rescale once n = M (ln2)^2 / lnP.

TODO: This needs tests! Please contribute!