```rust use bio_seq::*;
let seq = dna!("ACTGCTAGCA");
for kmer in seq.kmers::<8>() { println!("{}", kmer); } ```
bio_seq::alphabet::Dna
: DNA use the lexicographically ordered 2-bit representation
bio_seq::alphabet::Iupac
: IUPAC nucleotide ambiguity codes are represented with 4 bits
``` A C G T
S 0 1 1 0
rust
assert_eq!(
format!("{}", iupac!("AS-GYTNA") | iupac!("ANTGCAT-")),
"ANTGYWNA"
);
assert_eq!(
format!("{}", iupac!("ACGTSWKM") & iupac!("WKMSTNNA")),
"A----WKA"
);
TODO bio_seq::alphabet::amino
: Amino acid sequences
Kmers are sequences of DNA with a fixed size. These are implemented with const generics.
The 2-bit representation of DNA sequences is lexicographically ordered:
rust
// find the lexicographically minimum 8-mer
fn minimise(seq: Seq<Dna>) -> Option<Kmer::<8>> {
seq.kmers::<8>().min()
}
rust-bio
meant to replace Text/TextSlice
u8
to 2-bit representation