LT FM-Index

lt-fm-index is library for locate and count nucleotide sequence (ATGC) string.
lt-fm-index using k-mer lookup table (As you noticed, LT stands for lookup table).

Description

Features

Examples

Use FmIndexConfig to generate FmIndex

```rust use ltfmindex::FmIndexConfig;

// (1) Define configuration for fm-index let fmiconfig = FmIndexConfig::new() .setkmerlookuptable(8) .setsuffixarraysamplingratio(4) .containnonnucleotide(); // Default is true

// (2) Generate fm-index with text let text = b"CTCCGTACACCTGTTTCGTATCGGANNN".tovec(); let fmindex = fmiconfig.generatefmindex(text); // text is consumed

// (3) Match with pattern let pattern = b"TA".tovec(); // - count let count = fmindex.count(&pattern); asserteq!(count, 2); // - locate without k-mer lookup table let locations = fmindex.locatewoklt(&pattern); asserteq!(locations, vec![5,18]); // - locate with k-mer lookup table let locations = fmindex.locatewklt(&pattern); assert_eq!(locations, vec![5,18]); ```

Use FmIndexOn and FmIndexNn struct to generate FmIndex

```rust use ltfmindex::{FmIndexConfig, FmIndex, FmIndexOn, FmIndexNn};

// (1) Define configuration for fm-index let fmiconfig = FmIndexConfig::new() .setkmerlookuptable(8) .setsuffixarraysamplingratio(4) .containnonnucleotide();

// (2) Generate fm-index with text // - Use FmIndexOn struct directly let textonlync = b"CTCCGTACACCTGTTTCGTATCGGA".tovec(); let fmindexon = FmIndexOn::new(&fmiconfig, textonlync); // only_nucleotide field of config is ignored // - Use FmIndexNn struct directly let textnonnc = b"CTCCGTACACCTGTTTCGTATCGGANNN".tovec(); let fmindexnn = FmIndexNn::new(&fmiconfig, textnonnc);

// (3) match with pattern let pattern = b"TA".tovec(); // - count let counton = fmindexon.count(&pattern); let countnn = fmindexnn.count(&pattern); asserteq!(counton, countnn); // - locate without k-mer lookup table let locationson = fmindexon.locatewoklt(&pattern); let locationsnn = fmindexnn.locatewoklt(&pattern); asserteq!(locationson, locationsnn); // - locate with k-mer lookup table let locationson = fmindexon.locatewklt(&pattern); let locationsnn = fmindexnn.locatewklt(&pattern); asserteq!(locationson, locationsnn); `` - What's the difference? - TheFmIndexConfig::generate_fmindex()generatesBoxtype, while thenew()function of structs generate struct that are not surrounded byBox`.

Write and read FmIndex

```rust use ltfmindex::{FmIndexConfig, FmIndex, FmIndexOn, FmIndexNn};

// (1) Generate FmIndex let fmiconfig = FmIndexConfig::new() .setkmerlookuptable(8) .setsuffixarraysamplingratio(4); let text = b"CTCCGTACACCTGTTTCGTATCGGA".tovec(); let fmindexpre = FmIndexOn::new(&fmiconfig, text); // text is consumed

// (2) Write fm-index to buffer (or file path) let mut buffer = Vec::new(); fmindexpre.writeindexto(&mut buffer).unwrap();

// (3) Read fm-index from buffer (or file path) let fmindexpro = FmIndexOn::readindexfrom(&buffer[..]).unwrap();

asserteq!(fmindexpre, fmindex_pro); ```

Future works

https://github.com/baku4/lt-fm-index

Doc

https://docs.rs/lt-fm-index/

Reference