kseq
is a simple fasta/fastq (fastx) format parser library for Rust, its main function is to iterate over the records from fastx files (similar to kseq in C
). It uses shared buffer to read and store records, so the speed is very fast. It supports a plain or gz fastx file or io::stdin
, as well as a fofn (file-of-file-names) file, which contains multiple plain or gz fastx files (one per line).
Using kseq
is very simple. Users only need to call parse_path
to parse the path, and then use iter_record
method to get each record.
parse_path
This function takes a path (Option<String>
) as input, a path can be a fastx file, None
or -
for io::stdin
, or a fofn file. It returns a Result
type:
Ok(T)
: A struct T
with the iter_record
method.Err(E)
: An error E
including can't open or read, wrong fastx format or invalid path or file errors.iter_record
This function can be called in a loop, it returns a Result<Option<Record>>
type:
Ok(Some(Record))
: A struct Record
with methods:
head -> &str
: get sequence id/identifierseq -> &str
: get sequencedes -> &str
: get sequence description/commentsep -> &str
: get separatorqual -> &str
: get quality scoreslen -> usize
: get sequence lengthNote: call des
, sep
and qual
will return ""
if Record
doesn't have these attributes.
Ok(None)
: Stream has reached EOF
.Err(ParseError)
: An error ParseError
including IO
, TruncateFile
, InvalidFasta
or InvalidFastq
errors.``` use std::env::args; use kseq::parse_path;
fn main(){
let path: Option
cargo bench
We benchmarked kseq
against Needletail v0.4.1 to parse 500 megabases in multi-line fasta format and 4-line fastq format. The results are as follows:
```
FASTQ parsing/kseq time: [945.98 ms 974.99 ms 1.0052 s]
FASTQ parsing/needletail
time: [1.0133 s 1.0323 s 1.0527 s]
FASTA parsing/kseq time: [531.90 ms 544.50 ms 559.56 ms]
FASTA parsing/needletail
time: [620.42 ms 632.76 ms 649.72 ms]
Found 2 outliers among 10 measurements (20.00%)
2 (20.00%) high severe
```
[dependencies]
kseq = "0.2"