CircleCI

ncbitaxonomy

This is a Rust crate (i.e. library) for working with a local copy of the NCBI Taxonomy database. The database can be downloaded (either taxdump.zip or taxdump.tar.gz) from the NCBI Taxonomy FTP site.

Documentation for version 0.1.3 is available at crates.io.

taxonomyfilterrefseq

(new in 0.1.1)

A tool to filter a NCBI RefSeq FASTA file so that only the ancestors of a given taxon are retained.

```bash $ taxonomyfilterrefseq --help taxonomyfilterrefseq 0.1.2 Peter van Heusden pvh@sanbi.axc.za Filter NCBI RefSeq FASTA files by taxonomic lineage

USAGE: taxonomyfilterrefseq [OPTIONS] [OUTPUT_FASTA]

FLAGS: --nocurated Don't accept curated RNAs and proteins (NM, NR_ and NP_ accessions) --nopredicted Don't accept computationally predicted RNAs and proteins (XM, XR_ and XP_ accessions) -h, --help Prints help information -V, --version Prints version information

OPTIONS: -t, --tax_prefix String to prepend to names of nodes.dmp and names.dmp

ARGS: FASTA file with RefSeq sequences Directory containing the NCBI taxonomy nodes.dmp and names.dmp files Name of ancestor to use as ancestor filter Output FASTA filename (or stdout if omitted)

```

taxonomyfilterfastq

(new in version 0.2.0)

```bash $ taxonomyfilterfastq --help taxonomyfilterrefseq 0.1.2 Peter van Heusden pvh@sanbi.axc.za Filter NCBI RefSeq FASTA files by taxonomic lineage

USAGE: taxonomyfilterfastq [FLAGS] [OPTIONS] --ancestortaxid --taxdir --taxreport_filename <--centrifuge|--kraken2>

FLAGS: -d, --output_dir Directory to deposited filtered output files in -C, --centrifuge Filter using report from Centrifuge -h, --help Prints help information -K, --kraken2 Filter using report from Kraken2 -V, --version Prints version information

OPTIONS: -A, --ancestor_taxid Name of ancestor to use as ancestor filter -T, --taxdir Directory containing the NCBI taxonomy nodes.dmp and names.dmp files

-t, --tax_prefix <TAXONOMY_FILENAME_PREFIX>             String to prepend to names of nodes.dmp and names.dmp
-F, --tax_report_filename <TAXONOMY_REPORT_FILENAME>    Output from Kraken2 (default) or Centrifuge

ARGS: FASTA file with RefSeq sequences ```