Nushell bio

A bioinformatics plugin for nushell. This plugin parses most common bioinformatics formats into structured data so you can use them with nushell more effectively.

Quick setup

Go and get nushell, it's great. I'm assuming you have the rust toolchain installed. Then come back!

```nu

clone this repo

git clone https://github.com/Euphrasiologist/nupluginbio

change into the repo directory

cd nupluginbio

build

it's quite a long compile time...

cargo build --release

register the plugin

register nupluginbio/target/release/nupluginbio

see the current file formats currently supported below

now you can just use open, and the file extension will be auto-detected.

there are some test files in the tests/ dir.

open ./tests/test.fasta | get id

if you want to add flags you have to explicitly use from

e.g. if you want descriptions in fasta files to be parsed.

open --raw ./tests/test.fasta | from fasta -d | first ```

The backend is a noodles wrapper, an excellent, all-Rust bioinformatics I/O library.

Aims

Aim to support the following: - [x] BAM 1.6 - [x] BCF 2.2 - [x] bcf.gz - [x] VCF 4.3 - [x] vcf.gz - [x] BED(3 only right now) - [x] CRAM 3.0 - [x] FASTA - [x] fa.gz - [x] FASTQ - [x] fq.gz - [x] GFF3 - [ ] GTF 2.2 - [x] SAM 1.6 - [x] GFA 1.0 - [x] gfa.gz

Note that performance will not be optimal with the current state of nu_plugin, as we cannot access the engine state of nushell, and therefore need to load entire data structures into memory. Testing still needs to be done on large files.

More?

If there's a bioinformatics format you want to add, let me know, or add a PR.