The intent of crate is to provide Rust functionality for querying a Multi-String BWT (MSBWT), and is mostly based on the same methodology used by the original msbwt.
NOTE: This is very much a work-in-progress and currently only being updated as a side project during spare time. If you have any feature requests, feel free to submit a new issue on GitHub. Here is a current list of planned additions:
fmlrc2
ropebwt2
)All installation options assume you have installed Rust along with the cargo
crate manager for Rust.
bash
cargo install msbwt2
msbwt2-convert -h
```bash git clone https://github.com/HudsonAlpha/rust-msbwt.git cd rust-msbwt
cargo test --release cargo build --release ./target/release/msbwt2-convert -h ```
The Multi-String Burrows Wheeler Transform (MSBWT or BWT) must be built prior to performing any queries. Currently, there is no built in builder, but we expect to have one included soon. For now, the original instructions can be used.
Given a FASTQ file of reads (reads.fq.gz
), you can also use the following command from this crate to create a BWT at comp_msbwt.npy
.
Note that this command requires the ropebwt2 executable to be installed:
gunzip -c reads.fq.gz | \
awk 'NR % 4 == 2' | \
sort | \
tr NT TN | \
ropebwt2 -LR | \
tr NT TN | \
msbwt2-convert comp_msbwt.npy
If you are only using the BWT for k-mer queries, then the sort
can be removed from the above command.
This will reduce construction time significantly, but loses the read recovery property of the BWT.
The general use case of the library is k-mer queries, which can be performed as follows:
rust
use msbwt2::msbwt_core::BWT;
use msbwt2::rle_bwt::RleBWT;
use msbwt2::string_util;
let mut bwt = RleBWT::new();
let filename: String = "test_data/two_string.npy".to_string();
bwt.load_numpy_file(&filename);
assert_eq!(bwt.count_kmer(&string_util::convert_stoi(&"ACGT")), 1);
msbwt2
does not currently have a pre-print or paper. If you use msbwt2
, please cite the one of the msbwt
papers:
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.