FASTA and FASTQ parsing in Rust

This library provides an(other) attempt at high performance FASTA and FASTQ parsing. There are many similarities to the excellent fastq-rs crate. However, the API that provides streaming iterators where possible. The parsers will not panic if a record is too large to fit into the buffer (like fastqrs). Instead, the buffer will grow until the record can be accomodated. How the buffer grows can be configured by choosing or customizing implementations of the BufGrowStrategy trait. The bufredux library provides the underlying buffered reader. Byte copies are only done when the end of the buffer is reached and an incomplete record is moved to the start.

Note: Make sure to compile with LTO enabled because calls to buf_redux functions are not inlined otherwise.

Multi-threaded processing

The functions from the parallel module provide possibilities to send FASTQ/FASTA records to a thread pool where expensive calculations are done. Sequences are processesd in batches (RecordSet) because sending across channels has a performance impact. FASTA/FASTQ records can be accessed in both the 'worker' function and (after processing) a function running in the main thread.

Performance comparisons

All comparisons were run on a set of 100,000 auto-generated, synthetic sequences of uniform length (500 bp) loaded into memory. The parsers from this crate (seqio) are compared with fastq-rs (fastqrs) and Rust-Bio (bio). The bars represent the throughput in GB/s, the error bars show the +/- deviation as inferred from the deviations provided by cargo bench, that is: (max_time - min_time) / 2 used per iteration. Run on a Mac Pro (Mid 2010, 2.8 GHz Quad-Core Intel Xeon, OS X 10.9.5) using Rust 1.19 nightly

Explanation of labels:

 FASTA

FASTQ readers

 FASTQ

FASTQ readers

Remarks