Fast realization of suffix array and suffix tree for substring search, longest common prefix array (lcp array).
The current implementation builds suffix structures using bytes and does not decode the string before or during construction in Unicode. But if Unicode string is normalized before construction and search, then structures support Unicode (because all byte sequences are decoded unambiguously). Also search and lcp returns indexes as in byte array but not in Unicode decoded string.
SuffixTree ```rust use suff_collections::{array::, tree::, lcp::*};
// let word = "Some word"; let word: &str = "Some word\0"; let find: &str = "word";
// construct suffix tree let st: SuffixTree = SuffixTree::new(word);
// finds the entry position of the line 'find' in 'word'
let res: Option
// construct lcp
// lcp[i] = maxpref(sa[i], sa[i - 1]) && lcp.len() == sa.len()
// let lcp: LCP
// convert suffix tree to suffix array
// let sa = SuffixArray::
let mut buff = String::new(); SuffixTree::new("mississippi") .to_graphviz(&mut buff).unwrap(); println!("{}", &buff); ``` SuffixTree for the word "mississippi"
SuffixArray ```rust use suff_collections::{array::, tree::, lcp::*};
// let word = "Some word"; let word: &str = "Some word\0"; let find: &str = "word";
// construct suffix array
// let sa = SuffixArray::
// construct lcp
// lcp[i] = max_pref(sa[i], sa[i - 1]) && lcp.len() == sa.len()
let lcp: LCP
// finds the entry position of the line 'find' in 'word'
// O(|find| * log(|word|))
let res: Option
// finds all the entry position of the line 'find' in 'word' // O(|find| * log(|word|)) let resall: &[usize] = sa.findall(find);
// finds the entry position of the line 'find' in 'word'
// O(|word|)
let res: Option
// finds all the entry position of the line 'find' in 'word' // O(|word|) let resall: &[usize] = sa.findall_big(&sa.lcp(), find);
// convert suffix array to suffix tree let st = SuffixTree::from(sa);
let word = "mississippi";
let sa = SuffixArray::
SuffixArray and lcp for the word "mississippi"
LCP index suffixe's
0 11
0 10 i
1 7 ippi
1 4 issippi
4 1 ississippi
0 0 mississippi
0 9 pi
1 8 ppi
0 6 sippi
2 3 sissippi
1 5 ssippi
3 2 ssissippi
All construction and search work for O(n). For the suffix tree implementation the Ukkonen algorithm is taken and for the suffix array implementation the SA-IS algorithm is taken.
time: [401.96 us 403.67 us 405.75 us]
time: [371.61 us 373.21 us 375.08 us]
time: [2.2718 ms 2.2871 ms 2.3069 ms]
time: [2.5368 ms 2.5529 ms 2.5703 ms]
time: [1.9684 ms 1.9752 ms 1.9828 ms]
time: [2.2396 ms 2.2533 ms 2.2697 ms]
time: [2.3389 ms 2.3777 ms 2.4314 ms]
time: [2.6015 ms 2.6166 ms 2.6327 ms]
time: [2.0221 ms 2.0310 ms 2.0405 ms]
time: [2.2962 ms 2.3078 ms 2.3210 ms]
time: [4.1127 us 4.1273 us 4.1415 us]
time: [4.1591 us 4.1695 us 4.1808 us]
time: [4.1205 us 4.1366 us 4.1528 us]
time: [4.1452 us 4.1560 us 4.1678 us]
time: [22.410 us 22.507 us 22.607 us]
time: [19.944 us 20.299 us 20.752 us]
time: [22.613 us 22.715 us 22.820 us]
time: [19.946 us 20.283 us 20.672 us]
time: [13.533 ms 13.666 ms 13.812 ms]
time: [15.316 ms 15.433 ms 15.558 ms]
time: [20.506 us 20.839 us 21.221 us]
time: [16.392 ms 16.494 ms 16.602 ms]
time: [16.310 ms 16.411 ms 16.516 ms]
time: [16.084 ms 16.210 ms 16.348 ms]
time: [16.150 ms 16.260 ms 16.374 ms]
time: [6.7994 ms 6.8324 ms 6.8688 ms]
time: [6.5991 ms 6.6267 ms 6.6577 ms]
time: [6.2382 ms 6.2643 ms 6.2934 ms]
time: [6.2558 ms 6.2853 ms 6.3185 ms]