LtFmIndex

CI crates.io

LtFmIndex is a Rust library for building and using a FM-index that contains a lookup table of the first k-mer of a pattern. This index can be used to (1) count the number of occurrences and (2) locate the positions of a pattern in an indexed text.

Usage

Add to dependency

To use this library, add lt_fm_index to your Cargo.toml: toml [dependencies] lt_fm_index = "0.7.0-alpha" - About fastbwt features - This feature can accelerate the indexing, but needs cmake to build libdivsufsort and cannot be built as WASM.

Example code

``rust use lt_fm_index::LtFmIndex; use lt_fm_index::blocks::Block2; //Block2` can index 3 types of characters.

// (1) Define characters to use let charactersbyindex: &[&[u8]] = &[ &[b'A', b'a'], // 'A' and 'a' are treated as the same &[b'C', b'c'], // 'C' and 'c' are treated as the same &[b'G', b'g'], // 'G' and 'g' are treated as the same ]; // Alternatively, you can use this simpler syntax: let charactersbyindex: &[&[u8]] = &[ b"Aa", b"Cc", b"Gg" ];

// (2) Build index let text = b"CTCCGTACACCTGTTTCGTATCGGAXXYYZZ".tovec(); let ltfmindex= LtFmIndex::>::build( text, charactersby_index, 2, 4, ).unwrap();

// (3) Match with pattern let pattern = b"TA"; // - count let count = ltfmindex.count(pattern); asserteq!(count, 2); // - locate let mut locations = ltfmindex.locate(pattern); locations.sort(); // The locations may not be in order. asserteq!(locations, vec![5,18]); // All unindexed characters are treated as the same character. // In the text, X, Y, and Z can match any other unindexed character let mut locations = ltfmindex.locate(b"UNDEF"); locations.sort(); // Using the b"XXXXX", b"YYYYY", or b"!@#$%" gives the same result. assert_eq!(locations, vec![25,26]);

// (4) Save and load let mut buffer = Vec::new(); ltfmindex.saveto(&mut buffer).unwrap(); let loaded = LtFmIndex::loadfrom(&buffer[..]).unwrap(); asserteq!(ltfm_index, loaded); ```

Repository

https://github.com/baku4/lt-fm-index

API Doc

https://docs.rs/lt-fm-index/

Reference