Multithreaded encoding.
This crate provides a near drop in replacement for Write
that has will compress chunks of data in parallel and write
to an underlying writer in the same order that the bytes were handed to the writer. This allows for much faster
compression of data.
There are no features enabled by default. pargz_default
enables pargz
and flate2_default
. flate2_default
in turn
uses the default backed for flate2
, which, as of this writing is the pure rust backend. parsnap_default
is just a
wrapper over parsnap
, which pulls in the snap
dependency.
The following demonstrate common ways to override the features:
Simple way to enable both pargz
and parsnap
:
toml
[dependencies]
gzp = { version = "*", features = ["pargz_default", "parsnap_default"] }
Use pargz
to pull in flate2
and zlib-ng-compat
to use the zlib-ng-compat
backend for flate2
(most performant).
toml
[dependencies]
gzp = { version = "*", features = ["pargz", "zlib-ng-compat"] }
To use Snap (could also use parsnap_default
):
toml
[dependencies]
gzp = { version = "*", no_default_features = true, features = ["parsnap"] }
To use both Snap and Gzip with specific backend:
toml
[dependencies]
gzp = { version = "*", no_default_features = true, features = ["parsnap_default", "pargz", "zlib-ng-compat"] }
Simple example
```rust use std::{env, fs::File, io::Write};
use gzp::pargz::ParGz;
fn main() { let file = env::args().skip(1).next().unwrap(); let writer = File::create(file).unwrap(); let mut pargz = ParGz::builder(writer).build(); pargz.writeall(b"This is a first test line\n").unwrap(); pargz.writeall(b"This is a second test line\n").unwrap(); pargz.finish().unwrap(); } ```
An updated version of pgz.
```rust use gzp::pargz::ParGz; use std::io::{Read, Write};
fn main() { let chunksize = 64 * (1 << 10) * 2;
let stdout = std::io::stdout();
let mut writer = ParGz::builder(stdout).build();
let stdin = std::io::stdin();
let mut stdin = stdin.lock();
let mut buffer = Vec::with_capacity(chunksize);
loop {
let mut limit = (&mut stdin).take(chunksize as u64);
limit.read_to_end(&mut buffer).unwrap();
if buffer.is_empty() {
break;
}
writer.write_all(&buffer).unwrap();
buffer.clear();
}
writer.finish().unwrap();
} ```
Same thing but using Snappy instead.
```rust use gzp::parsnap::ParSnap; use std::io::{Read, Write};
fn main() { let chunksize = 64 * (1 << 10) * 2;
let stdout = std::io::stdout();
let mut writer = ParSnap::builder(stdout).build();
let stdin = std::io::stdin();
let mut stdin = stdin.lock();
let mut buffer = Vec::with_capacity(chunksize);
loop {
let mut limit = (&mut stdin).take(chunksize as u64);
limit.read_to_end(&mut buffer).unwrap();
if buffer.is_empty() {
break;
}
writer.write_all(&buffer).unwrap();
buffer.clear();
}
writer.finish().unwrap();
} ```
flate2::bufread::MultiGzDecoder
.Bytes
in favor of raw vecAll benchmarks were run on the file in ./bench-data/shakespeare.txt
catted together 100 times which creats a rough
550Mb file.
Note that there are far more comprehensive comparisons of the tradeoffs in differet compression algorithms / compression levels elsewhere on the interent. This is meant to give rough understanding of the tradoffs involved.
| Name | Num Threads | Compression Level | Buffer Size | Time | File Size | | --- | - | ----------------- | -----------| ---- | --------- | | Gzip Only | NA | 3 | 128 Kb | 6.6s | 218 Mb | | Gzip | 1 | 3 | 128 Kb | 2.4s | 223 Mb | | Gzip | 4 | 3 | 128 Kb | 1.2s | 223 Mb | | Gzip | 8 | 3 | 128 Kb | 0.8s | 223 Mb | | Gzip | 16 | 3 | 128 Kb | 0.6s | 223 Mb | | Gzip | 30 | 3 | 128 Kb | 0.6s | 223 Mb | | Snap Only | NA | NA | 128 Kb | 1.6s | 333 Mb | | Snap | 1 | NA | 128 Kb | 0.7s | 333 Mb | | Snap | 4 | NA | 128 Kb | 0.5s | 333 Mb | | Snap | 8 | NA | 128 Kb | 0.4s | 333 Mb | | Snap | 16 | NA | 128 Kb | 0.4s | 333 Mb | | Snap | 30 | NA | 128 Kb | 0.4s | 333 Mb |