trie-rs

Memory efficient trie (prefix tree) library based on LOUDS.

Master API Docs | Released API Docs | Benchmark Results | Changelog

Build Status Crates.io Version Crates.io Downloads Minimum rustc version License: MIT License: Apache 2.0

Quickstart

To use trie-rs, add the following to your Cargo.toml file:

toml [dependencies] trie-rs = "0.1" # NOTE: Replace to latest minor version.

Usage Overview

```rust use std::str; use trie_rs::TrieBuilder;

let mut builder = TrieBuilder::new(); // Inferred TrieBuilder<u8> automatically builder.push("すし"); builder.push("すしや"); builder.push("すしだね"); builder.push("すしづめ"); builder.push("すしめし"); builder.push("すしをにぎる"); builder.push("すし"); // Word pushed twice is just ignored. builder.push("🍣");

let trie = builder.build();

// exactmatch(): Find a word exactly match to query. asserteq!(trie.exactmatch("すし"), true); asserteq!(trie.exactmatch("🍣"), true); asserteq!(trie.exact_match("🍜"), false);

// predictivesearch(): Find words which include query as their prefix. let resultsinu8s: Vec> = trie.predictivesearch("すし"); let resultsinstr: Vec<&str> = resultsinu8s .iter() .map(|u8s| str::fromutf8(u8s).unwrap()) .collect(); asserteq!( resultsinstr, vec![ "すし", "すしだね", "すしづめ", "すしめし", "すしや", "すしをにぎる" ] // Sorted by Vec<u8>'s order );

// commonprefixsearch(): Find words which is included in query's prefix. let resultsinu8s: Vec> = trie.commonprefixsearch("すしや"); let resultsinstr: Vec<&str> = resultsinu8s .iter() .map(|u8s| str::fromutf8(u8s).unwrap()) .collect(); asserteq!( resultsinstr, vec![ "すし", "すしや", ] // Sorted by Vec<u8>'s order ); ```

Using with Various Data Types

TrieBuilder is implemented using generic type like following:

impl<Label: Ord + Clone> TrieBuilder<Label> { ... pub fn push<Arr: AsRef<[Label]>>(&mut self, word: Arr) { ... } ... }

In the above Usage Overview example, we used Label=u8, Arr=&str.

Here shows other Label and Arr type examples.

Label=&str, Arr=Vec<&str>

Say Label is English words and Arr is English phrases.

```rust use trie_rs::TrieBuilder;

let mut builder = TrieBuilder::new(); builder.push(vec!["a", "woman"]); builder.push(vec!["a", "woman", "on", "the", "beach"]); builder.push(vec!["a", "woman", "on", "the", "run"]);

let trie = builder.build();

asserteq!( trie.exactmatch(vec!["a", "woman", "on", "the", "beach"]), true ); asserteq!( trie.predictivesearch(vec!["a", "woman", "on"]), vec![ ["a", "woman", "on", "the", "beach"], ["a", "woman", "on", "the", "run"], ], ); asserteq!( trie.commonprefix_search(vec!["a", "woman", "on", "the", "beach"]), vec![vec!["a", "woman"], vec!["a", "woman", "on", "the", "beach"]], ); ```

Label=u8, Arr=[u8; n]

Say Label is a digit in Pi (= 3.14...) and Arr is a window to separate pi's digit by 10.

```rust use trie_rs::TrieBuilder;

let mut builder = TrieBuilder::::new(); // Pi = 3.14...

builder.push([1, 4, 1, 5, 9, 2, 6, 5, 3, 5]); builder.push([8, 9, 7, 9, 3, 2, 3, 8, 4, 6]); builder.push([2, 6, 4, 3, 3, 8, 3, 2, 7, 9]); builder.push([6, 9, 3, 9, 9, 3, 7, 5, 1, 0]); builder.push([5, 8, 2, 0, 9, 7, 4, 9, 4, 4]); builder.push([5, 9, 2, 3, 0, 7, 8, 1, 6, 4]); builder.push([0, 6, 2, 8, 6, 2, 0, 8, 9, 9]); builder.push([8, 6, 2, 8, 0, 3, 4, 8, 2, 5]); builder.push([3, 4, 2, 1, 1, 7, 0, 6, 7, 9]); builder.push([8, 2, 1, 4, 8, 0, 8, 6, 5, 1]); builder.push([3, 2, 8, 2, 3, 0, 6, 6, 4, 7]); builder.push([0, 9, 3, 8, 4, 4, 6, 0, 9, 5]); builder.push([5, 0, 5, 8, 2, 2, 3, 1, 7, 2]); builder.push([5, 3, 5, 9, 4, 0, 8, 1, 2, 8]);

let trie = builder.build();

asserteq!(trie.exactmatch([5, 3, 5, 9, 4, 0, 8, 1, 2, 8]), true); asserteq!( trie.predictivesearch([3]), vec![ [3, 2, 8, 2, 3, 0, 6, 6, 4, 7], [3, 4, 2, 1, 1, 7, 0, 6, 7, 9], ], ); asserteq!( trie.commonprefix_search([1, 4, 1, 5, 9, 2, 6, 5, 3, 5]), vec![[1, 4, 1, 5, 9, 2, 6, 5, 3, 5]], ); ```

Features

Acknowledgments

edict.furigana is used for benchmark. This file is constructed in the following step:

  1. Download edict.gz from EDICT.
  2. Convert it from original EUC into UTF-8.
  3. Translate it into CSV file with edict-to-csv.
  4. Extract field $1 for Hiragana/Katakana words, and field $3 for other (like Kanji) words.
  5. Translate Katakana into Hiragana with kana2hira.

Many thanks for these dictionaries and tools.

Versions

trie-rs uses semantic versioning.

Since current major version is 0, minor version update might involve breaking public API change (although it is carefully avoided).

Rust Version Supports

trie-rs is continuously tested with these Rust versions in Travis CI:

So it expectedly works with Rust 1.33.0 and any newer versions.

Older versions may also work, but are not tested or guaranteed.

Contributing

Any kind of pull requests are appreciated.

Guidelines

License

MIT OR Apache-2.0