Bao Spec — Rust Crate — Rust Docs
Bao is an implementation of BLAKE3 verified streaming, as described in Section 6.4 of the BLAKE3 spec. Tree hashes like BLAKE3 make it possible to verify part of a file without re-hashing the entire thing, using an encoding format that stores the bytes of the file together with all the nodes of its hash tree. Clients can stream this encoding, or do random seeks into it, while verifying that every byte they read matches the root hash. For the details of how this works, see the Bao spec.
This project includes two Rust crates, the
bao
library crate and the
bao_bin
binary crate. The latter
provides the bao
command line utility.
Caution! Bao is beta cryptography software. It has not been formally audited yet.
Use case: A secure messaging app might support attachment files by including the hash of an attachment in the metadata of a message. With a serial hash, the recipient would need to download the entire attachment to verify it, but that can be impractical for things like large video files. With BLAKE3 and Bao, the recipient can stream a video attachment, while still verifying each byte as it comes in. (This scenario was the original motivation for the Bao project.)
```sh
head -c 1000000 /dev/urandom > f
bao encode f f.bao
stat -c "%n %s" f f.bao | column -t f 1000000 f.bao 1062472
b3sum
tool wouldhash=
bao hash f
bao decode $hash < f.bao > f2 cmp f f2
badhash="0000000000000000000000000000000000000000000000000000000000000000" bao decode $badhash < f.bao Error: Custom { kind: InvalidData, error: StringError("hash mismatch") } ```
Encoded files support random seeking, but seeking might not be available or efficient over the network. (Note that one seek in the content usually requires several seeks in the encoding, as the decoder traverses the hash tree level-by-level.) In these situations, rather than trying to seek remotely, clients can instead request an encoded slice containing the range of content bytes they need. Creating a slice requires the sender to seek over the full encoding, but the recipient can then stream the slice without seeking at all. Decoding a slice uses the same root hash as regular decoding, so it doesn't require any preparation in advance from the sender or the recipient.
Use case: A BitTorrent-like application could fetch different slices of a file from different peers, without needing to define the slices ahead of time. Or a distributed file storage application could request random slices of an archived file from its storage providers, to prove that they're honestly storing the file, without needing to prepare or store challenges for the future.
```sh
bao slice 500000 100000 f.bao f.slice
stat -c "%n %s" f.slice f.slice 107272
bao decode-slice $hash 500000 100000 < f.slice > f.slice.out
tail
numbers bytes starting with 1.)tail --bytes=+500001 f | head -c 100000 > expected.out cmp f.slice.out expected.out
bao decode-slice $bad_hash 500000 100000 < f.slice Error: Custom { kind: InvalidData, error: StringError("hash mismatch") } ```
By default, all of the operations above work with a "combined" encoded
file, that is, one that contains both the content bytes and the tree
hash bytes interleaved. However, sometimes you want to keep them
separate, for example to avoid duplicating a very large input file. In
these cases, you can use the "outboard" encoding format, via the
--outboard
flag:
```sh
bao encode f --outboard f.obao
stat -c "%n %s" f f.bao f.obao | column -t f 1000000 f.bao 1062472 f.obao 62472
bao decode $hash f --outboard f.obao f4 cmp f f4 ```
The bao
command line utility is published on
crates.io as the
bao_bin
crate. To install it, add
~/.cargo/bin
to your PATH
and then run:
sh
cargo install bao_bin
To build the binary directly from this repo:
sh
git clone https://github.com/oconnor663/bao
cd bao/bao_bin
cargo build --release
./target/release/bao --help
tests/bao.py
is a fully functional second
implementation in Python, designed to be as short and readable as
possible. It's a good starting point for understanding the algorithms
involved, before diving into the Rust code.