krakenXtract

Release GitHub release (with filter)

Extract reads from a FASTQ file based on taxonomic classification via Kraken2.

Written in Rust.

Background

I recently wanted to extract reads from a medium-ish sized (6GB) FASTQ file (~5.5 million reads), based on taxonomic classifications. For that I used the great KrakenTools. This however took a while both parse the Kraken2 output file and extract/write the matching reads. Having been wanting to experiment with Rust for a while, this inspired me to re-implement the extract_kraken_reads.py script in Rust as a learning exercise.

This is currently an early implementation (and my first Rust programme!), with plans to expand functionality.

Current features

Benchmarks (rough)

For more detail see benchmarks

Installation

Download the latest release.

Alternatively, build from source:

Clone the repository:

bash git clone https://github.com/Sam-Sims/krakenxtract

Install rust/cargo:

To install please refer to the rust documentation: docs

Build and add to path:

bash cd kraken-extract cargo build --release export PATH=$PATH:$(pwd)/target/release

All executables will be in the directory kraken-extract/target/release.

Usage

bash kraken-extract --kraken <kraken_output> --fastq <fastq_file> --taxid <taxonomic_id> --output <output_file>

Arguments

-k, --kraken <KRAKEN_OUTPUT> -t, --taxid <TAXID> -r, --report <REPORT_OUTPUT> -f, --fastq <FASTQ_FILE> -o, --output <OUTPUT_LOCATION> --compression-mode <COMPRESSION> [default: fast] --parents --children --no-compress --exclude -h, --help Print help -V, --version Print version

--parents: This will extract all the reads classified at all taxons between the root and the specified --taxid

--children: This will extract all the reads classified as decendents or subtaxa of --taxid (Including the taxid)

--compression_mode: This defines the compression mode of the output fastq.gz file - fast / default / best

--no-compress: This will output a plaintext fastq file

--exclude: This will output every read except those matching the taxid. Works with --parents and --children

Future plans

Version

Changelog

0.2.3

0.2.2

0.2.1

0.2.0

0.1.0