FastCDC

This crate implements the "FastCDC" content defined chunking algorithm in pure Rust. A critical aspect of its behavior is that it returns exactly the same results for the same input. To learn more about content defined chunking and its applications, see the reference material linked below.

Requirements

Building and Testing

shell $ cargo clean $ cargo build $ cargo test

Example Usage

Examples are coming soon; in the mean time, please consider this simple demonstration:

rust let read_result = fs::read("test/fixtures/SekienAkashita.jpg"); assert!(read_result.is_ok()); let contents = read_result.unwrap(); let chunker = FastCDC::new(&contents, 16384, 32768, 65536); let results: Vec<Chunk> = chunker.collect(); assert_eq!(results.len(), 3); assert_eq!(results[0].offset, 0); assert_eq!(results[0].length, 32857); assert_eq!(results[1].offset, 32857); assert_eq!(results[1].length, 16408); assert_eq!(results[2].offset, 49265); assert_eq!(results[2].length, 60201);

Reference Material

The algorithm is as described in "FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication"; see the paper, and presentation for details.

Prior Art

This crate is little more than a rewrite of the implementation by Joran Dirk Greef (see the ronomon link below), in Rust, and greatly simplified in usage. One significant difference is that the chunker in this crate does not calculate a hash digest of the chunks.