pco

⚠️ Both the API and the data format are unstable for the 0.0.0-alpha.* releases. Do not depend on pco for long-term storage yet. ⚠️

Usage as a Standalone Format

```rust use pco::standalone::{autocompress, autodecompress}; use pco::DEFAULTCOMPRESSIONLEVEL;

fn main() { // your data let mut myints = Vec::new(); for i in 0..100000 { myints.push(i as i64); }

// Here we let the library choose a configuration with default compression // level. If you know about the data you're compressing, you can compress // faster by creating a CompressorConfig. let bytes: Vec = autocompress(&myints, DEFAULTCOMPRESSIONLEVEL); println!("compressed down to {} bytes", bytes.len());

// decompress let recovered = auto_decompress::(&bytes).expect("failed to decompress"); println!("got back {} ints from {} to {}", recovered.len(), recovered[0], recovered.last().unwrap()); } ```

To run something right away, try the benchmarks.

For a lower-level standalone API that allows writing one chunk at a time / streaming reads, see the docs.rs documentation.

Usage as a Wrapped Format

To embed/interleave pco in another data format, it is better to use the wrapped API and format than standalone. This allows * fine-level data paging with good compression ratio down to page sizes of >20 numbers (as long as the overall chunk has >2k or so) * less bloat by omitting metadata that the wrapping format must retain

Advanced

Custom Data Types

Small data types can be efficiently compressed in expansion: for example, compressing u16 data as a sequence of u32 values. The only cost to using a larger datatype is a very small increase in chunk metadata size.

When necessary, you can implement your own data type via pco::data_types::NumberLike and (if the existing implementations are insufficient) pco::data_types::UnsignedLike and pco::data_types::FloatLike.

Seeking and Statistics

Each chunk has a metadata section containing * the total count of numbers in the chunk, * the bins for the chunk and relative frequency of each bin, * and the size in bytes of the compressed body.

Using the compressed body size, it is easy to seek through the whole file and collect a list of all the chunk metadatas. One can aggregate them to obtain the total count of numbers in the whole file and even an approximate histogram. This is typically about 100x faster than decompressing all the numbers.