lt-fm-index
is library for locate and count nucleotide sequence (ATGC) string.
lt-fm-index
using k-mer lookup table (As you noticed, LT stands for lookup table).
FmIndexOn
supports a text with only genetic nucleotide sequence (ACGT).FmIndexNn
supports a text containing non-nucleotide sequence.FmIndexNn
treats all non-nucleotide as the same character.crate
is not stable. Functions can be changed without notice.libdivsufsort
library.FmIndexConfig
to generate FmIndex
```rust use ltfmindex::FmIndexConfig;
// (1) Define configuration for fm-index
let fmiconfig = FmIndexConfig::new()
.setkmerlookuptable(8)
.setsuffixarraysamplingratio(4)
.containnonnucleotide(); // Default is true
// (2) Generate fm-index with text let text = b"CTCCGTACACCTGTTTCGTATCGGANNN".tovec(); let fmindex = fmiconfig.generatefmindex(text); // text is consumed
// (3) Match with pattern let pattern = b"TA".tovec(); // - count let count = fmindex.count(&pattern); asserteq!(count, 2); // - locate without k-mer lookup table let locations = fmindex.locatewoklt(&pattern); asserteq!(locations, vec![5,18]); // - locate with k-mer lookup table let locations = fmindex.locatewklt(&pattern); assert_eq!(locations, vec![5,18]); ```
FmIndexOn
and FmIndexNn
struct to generate FmIndex
```rust use ltfmindex::{FmIndexConfig, FmIndex, FmIndexOn, FmIndexNn};
// (1) Define configuration for fm-index let fmiconfig = FmIndexConfig::new() .setkmerlookuptable(8) .setsuffixarraysamplingratio(4) .containnonnucleotide();
// (2) Generate fm-index with text
// - Use FmIndexOn
struct directly
let textonlync = b"CTCCGTACACCTGTTTCGTATCGGA".tovec();
let fmindexon = FmIndexOn::new(&fmiconfig, textonlync); // only_nucleotide
field of config is ignored
// - Use FmIndexNn
struct directly
let textnonnc = b"CTCCGTACACCTGTTTCGTATCGGANNN".tovec();
let fmindexnn = FmIndexNn::new(&fmiconfig, textnonnc);
// (3) match with pattern
let pattern = b"TA".tovec();
// - count
let counton = fmindexon.count(&pattern);
let countnn = fmindexnn.count(&pattern);
asserteq!(counton, countnn);
// - locate without k-mer lookup table
let locationson = fmindexon.locatewoklt(&pattern);
let locationsnn = fmindexnn.locatewoklt(&pattern);
asserteq!(locationson, locationsnn);
// - locate with k-mer lookup table
let locationson = fmindexon.locatewklt(&pattern);
let locationsnn = fmindexnn.locatewklt(&pattern);
asserteq!(locationson, locationsnn);
``
- What's the difference?
- The
FmIndexConfig::generate_fmindex()generates
Boxtype, while the
new()function of structs generate struct that are not surrounded by
Box`.
FmIndex
```rust use ltfmindex::{FmIndexConfig, FmIndex, FmIndexOn, FmIndexNn};
// (1) Generate FmIndex
let fmiconfig = FmIndexConfig::new()
.setkmerlookuptable(8)
.setsuffixarraysamplingratio(4);
let text = b"CTCCGTACACCTGTTTCGTATCGGA".tovec();
let fmindexpre = FmIndexOn::new(&fmiconfig, text); // text is consumed
// (2) Write fm-index to buffer (or file path) let mut buffer = Vec::new(); fmindexpre.writeindexto(&mut buffer).unwrap();
// (3) Read fm-index from buffer (or file path) let fmindexpro = FmIndexOn::readindexfrom(&buffer[..]).unwrap();
asserteq!(fmindexpre, fmindex_pro); ```
32bit
integer
https://github.com/baku4/lt-fm-index
libdivsufsort