rustybam

Actions Status Actions Status Actions Status

Conda (channel only) Downloads

crates.io version crates.io downloads

DOI

Usage

rustybam [OPTIONS] <SUBCOMMAND>

or

rb [OPTIONS] <SUBCOMMAND>

Options and subcommands

``` rustybam 0.1.23 Mitchell R. Vollger mrvollger@gmail.com bioinformatics toolkit in rust

USAGE: rb [OPTIONS]

OPTIONS: -t, --threads threads for decompression [default: 8] -v, --verbose logging level -h, --help Print help information -V, --version Print version information

SUBCOMMANDS: stats Get percent identity stats from a sam/bam/cram or PAF bed-length Count the number of bases in a bed file [aliases: bedlen, bl, bedlength] filter Filter PAF records in various ways invert Invert the target and query sequences in a PAF along with the CIGAR string liftover Liftover target sequence coordinates onto query sequence using a PAF trim-paf Trim paf records that overlap in query sequence [aliases: trim, tp] orient Orient paf records so that most of the bases are in the forward direction break-paf Break PAF records with large indels into multiple records (useful for SafFire) [aliases: breakpaf, bp] paf-to-sam Convert a PAF file into a SAM file. Warning, all alignments will be marked as primary! [aliases: paftosam, p2s, paf2sam] fasta-split Reads in a fasta from stdin and divides into files (can compress by adding .gz) [aliases: fastasplit, fasplit] fastq-split Reads in a fastq from stdin and divides into files (can compress by adding .gz) [aliases: fastqsplit, fqsplit] get-fasta Mimic bedtools getfasta but allow for bgzip in both bed and fasta inputs [aliases: getfasta, gf] nucfreq Get the frequencies of each bp at each position repeat Report the longest exact repeat length at every position in a fasta suns Extract the intervals in a genome (fasta) that are made up of SUNs help Print this message or the help of the given subcommand(s) ```

Install

conda

shell mamba install -c bioconda rustybam

cargo

shell cargo install rustybam

Pre-complied binaries

Download from releases (may be slower than locally complied versions).

Source

shell git clone https://github.com/mrvollger/rustybam.git cd rustybam cargo build --release

and the executables will be built here:

shell target/release/{rustybam,rb}

Examples

PAF or BAM statistics

For BAM files with extended cigar operations we can calculate statistics about the aliment and report them in BED format.

shell rustybam stats {input.bam} > {stats.bed}

The same can be done with PAF files as long as they are generated with -c --eqx.

shell rustybam stats --paf {input.paf} > {stats.bed}

PAF liftovers

I have a PAF and I want to subset it for just a particular region in the reference.

With rustybam its easy:

shell rustybam liftover \ --bed <(printf "chr1\t0\t250000000\n") \ input.paf > trimmed.paf

But I also want the alignment statistics for the region.

No problem, rustybam liftover does not just trim the coordinates but also the CIGAR so it is ready for rustybam stats:

```shell rustybam liftover \ --bed <(printf "chr1\t0\t250000000\n") \ input.paf \ | rustybam stats --paf \

trimmed.stats.bed ```

Okay, but Evan asked for an "align slider" so I need to realign in chunks.

No need, just make your bed query to rustybam liftoff a set of sliding windows and it will do the rest.

```shell rustybam liftover \ --bed <(bedtools makewindows -w 100000 \ <(printf "chr1\t0\t250000000\n") \ ) \ input.paf \ | rustybam stats --paf \

trimmed.stats.bed ```

You can also use rustybam breakpaf to break up the paf records of indels above a certain size to get more "miropeats" like intervals.

```shell rustybam breakpaf --max-size 1000 input.paf \ | rustybam liftover \ --bed <(printf "chr1\t0\t250000000\n") \ | ./rustybam stats --paf \

trimmed.stats.bed ```

Yeah but how do I visualize the data?

Try out SafFire!

Split fastx files

Split a fasta file between stdout and two other files both compressed and uncompressed.

shell cat {input.fasta} | rustybam fasta-split two.fa.gz three.fa

Split a fastq file between stdout and two other files both compressed and uncompressed.

shell cat {input.fastq} | rustybam fastq-split two.fq.gz three.fq

Extract from a fasta

This tools is designed to mimic bedtools getfasta but this tools allows the fasta to be bgzipped.

shell samtools faidx {seq.fa(.gz)} rb get-fasta --name --strand --bed {regions.of.interest.bed} --fasta {seq.fa(.gz)}

TODO