sss

Crates.io Docs.rs

A short string compressor/decompressor that can store 20,000+ words in three bytes or less.

Similar to smaz, sss is able to compress small strings, something that other conventional compression algorithms struggle with.

However, sss is typically better than smaz, certainly for very commonly used words (out of 10000 most common words, ony 1% had better compression with smaz) sss can also represent numbers, repeated sequences and non-alphanumeric characters more efficiently than smaz. It can encode unicode characters, but not very efficiently. If your text includes a few unicode characters it should still compress better, but if your strings are mostly unicode characters, other schemes such as Unishox are better.

Cost

sss uses several tables with over 18000 total entries. Obviously this will incur a large runtime memory and binary file size cost, but if you have the memory available, it is worth it to compress more effectively.

To match these, currently we use a poor algorithm that lops over EVERY entry in EVERY table to obtain the best map. Future versions will use a phf hash table approach.

Examples

Using examples directly from smaz we have:

[Insert examples]

We can see how every example is compressed more with sss than smaz.

Encoding

The Snaz encoding is as follows: