This crate is a Rust port of Daniel Lemire's simdcomp C library. It contains different implementation of integers compression via bitpacking. Each implementation requires on a different CPU SIMD instruction set, for state of the art performance.
This crate is used by tantivy
.
For instance, with SSE3
, you can typically expect more than 4 billions ints per seconds.
Check the Benchmark for more accurate details.
Compressing small integers or increasing integers is a very common problem in search engines, databases, analytics. The latter can be reduced to the earlier via delta-encoding (i.e. Encoding the difference between consecutive difference rather than the integers themselves). Traditional compression scheme like LZ4 is really not very efficient to address this problem.
There are different families of solutions to this problem. One of the most straightforward and efficient one is bitpacking
:
128
when using the SSE2 implementation).b
that makes it possible to represent all of the integers. In other words, the smallest b
such that all integers in the block are stricly smaller than 2n.b
-bits.For instance, assuming a block of 4
, when encoding 4, 9, 3, 2
. Assuming that the highest value in the block is 9, b = 4
. All values will then be encoded over 4 bits as follows.
| original number | binary representation | |:----------------|:----------------------| | 4 | 0100 | | 9 | 1001 | | 3 | 0011 | | 2 | 0010 |
As a result
This crate contains different implementation, depending on the instruction set available with your processor. The resulting format are not compatible one with each other.
Currently the following are available : - scalar: implementation not using any SIMD instruction. This implementation is still very performant - SSE2: - AVX
```rust extern crate bitpacking; use bitpacking::{SSE3BitPacker, BitPacker};
// ... let numbits: u8 = SSE3BitPacker::numbits(&fakedata); let mut compressed = vec![0u8; (numbits as usize) * SSE3BitPacker::BLOCKLEN / 8]; SSE3BitPacker::compress(&fakedata, &mut compressed[..], num_bits); ```
```rust extern crate bitpacking; use bitpacking::{SSE3BitPacker, BitPacker};
// ... let mut decompressed = vec![0u32; SSE3BitPacker::BLOCKLEN]; SSE3BitPacker::decompress(&compressed, &mut decompressed[..], numbits);
asserteq!(&fakedata, &decompressed); ```
Make sure to compile with
RUSTFLAGS="-C target-cpu=native"