rustybam
is a bioinformatics toolkit written in the rust
programing language focused around manipulation of alignment (bam
and PAF
), annotation (bed
), and sequence (fasta
and fastq
) files.
shell
rustybam [OPTIONS] <SUBCOMMAND>
or
shell
rb [OPTIONS] <SUBCOMMAND>
The full manual of subcommands can be found on the docs.
shell
SUBCOMMANDS:
stats Get percent identity stats from a sam/bam/cram or PAF
bed-length Count the number of bases in a bed file [aliases: bedlen, bl, bedlength]
filter Filter PAF records in various ways
invert Invert the target and query sequences in a PAF along with the CIGAR string
liftover Liftover target sequence coordinates onto query sequence using a PAF
trim-paf Trim paf records that overlap in query sequence [aliases: trim, tp]
orient Orient paf records so that most of the bases are in the forward direction
break-paf Break PAF records with large indels into multiple records (useful for
SafFire) [aliases: breakpaf, bp]
paf-to-sam Convert a PAF file into a SAM file. Warning, all alignments will be marked as
primary! [aliases: paftosam, p2s, paf2sam]
fasta-split Reads in a fasta from stdin and divides into files (can compress by adding
.gz) [aliases: fastasplit, fasplit]
fastq-split Reads in a fastq from stdin and divides into files (can compress by adding
.gz) [aliases: fastqsplit, fqsplit]
get-fasta Mimic bedtools getfasta but allow for bgzip in both bed and fasta inputs
[aliases: getfasta, gf]
nucfreq Get the frequencies of each bp at each position
repeat Report the longest exact repeat length at every position in a fasta
suns Extract the intervals in a genome (fasta) that are made up of SUNs
help Print this message or the help of the given subcommand(s)
shell
mamba install -c bioconda rustybam
shell
cargo install rustybam
Download from releases (may be slower than locally complied versions).
shell
git clone https://github.com/mrvollger/rustybam.git
cd rustybam
cargo build --release
and the executables will be built here:
shell
target/release/{rustybam,rb}
For BAM files with extended cigar operations we can calculate statistics about the aliment and report them in BED format.
shell
rustybam stats {input.bam} > {stats.bed}
The same can be done with PAF files as long as they are generated with -c --eqx
.
shell
rustybam stats --paf {input.paf} > {stats.bed}
I have a
PAF
and I want to subset it for just a particular region in the reference.
With rustybam
its easy:
shell
rustybam liftover \
--bed <(printf "chr1\t0\t250000000\n") \
input.paf > trimmed.paf
But I also want the alignment statistics for the region.
No problem, rustybam liftover
does not just trim the coordinates but also the CIGAR
so it is ready for rustybam stats
:
```shell rustybam liftover \ --bed <(printf "chr1\t0\t250000000\n") \ input.paf \ | rustybam stats --paf \
trimmed.stats.bed ```
Okay, but Evan asked for an "align slider" so I need to realign in chunks.
No need, just make your bed
query to rustybam liftoff
a set of sliding windows
and it will do the rest.
```shell rustybam liftover \ --bed <(bedtools makewindows -w 100000 \ <(printf "chr1\t0\t250000000\n") \ ) \ input.paf \ | rustybam stats --paf \
trimmed.stats.bed ```
You can also use rustybam breakpaf
to break up the paf records of indels above a certain size to
get more "miropeats" like intervals.
```shell rustybam breakpaf --max-size 1000 input.paf \ | rustybam liftover \ --bed <(printf "chr1\t0\t250000000\n") \ | ./rustybam stats --paf \
trimmed.stats.bed ```
Yeah but how do I visualize the data?
Try out SafFire!
Split a fasta file between stdout
and two other files both compressed and uncompressed.
shell
cat {input.fasta} | rustybam fasta-split two.fa.gz three.fa
Split a fastq file between stdout
and two other files both compressed and uncompressed.
shell
cat {input.fastq} | rustybam fastq-split two.fq.gz three.fq
This tools is designed to mimic bedtools getfasta
but this tools allows the fasta to be bgzipped
.
shell
samtools faidx {seq.fa(.gz)}
rb get-fasta --name --strand --bed {regions.of.interest.bed} --fasta {seq.fa(.gz)}
trim-paf
.bedtools getfasta
like operation that actually works with bgzipped input.
D4
for Nucfreq.suns
.