mih-rs

Documentation Crates.io License: MIT

Rust implementation of multi-index hashing (MIH) for neighbor searches on binary codes in the Hamming space, described in the paper

Norouzi, Punjani, and Fleet, Fast exact search in Hamming space with multi-index hashing, IEEE TPAMI, 36(6):1107– 1119, 2014.

Features

Example

```rust use mih_rs::Index;

fn main() { // Database of codes let codes: [u64; 8] = [ 0b1111111111111111111111011111111111111111111111111011101111111111, // #zeros = 3 0b1111111111111111111111111111111101111111111011111111111111111111, // #zeros = 2 0b1111111011011101111111111111111101111111111111111111111111111111, // #zeros = 4 0b1111111111111101111111111111111111111000111111111110001111111110, // #zeros = 8 0b1101111111111111111111111111111111111111111111111111111111111111, // #zeros = 1 0b1111111111111111101111111011111111111111111101001110111111111111, // #zeros = 6 0b1111111111111111111111111111111111101111111111111111011111111111, // #zeros = 2 0b1110110101011011011111111111111101111111111111111000011111111111, // #zeros = 11 ];

// Query code
let qcode: u64 = 0b1111111111111111111111111111111111111111111111111111111111111111;

// Construct the index
let index = Index::new(&codes).unwrap();

// Find the ids of neighbor codes whose Hamming distances are within 2
let answers = index.range_search(qcode, 2);
println!("{:?}", answers); // [1, 4, 6]

// Find the ids of the top-4 nearest neighbor codes
let answers = index.topk_search(qcode, 4);
println!("{:?}", answers); // [4, 1, 6, 0]

} ```

Binary code types

Index can be built from an array of type CodeInt that is a primitive integer trait supporting a popcount operation. Currently, this library defines CodeInt for u8, u16, u32, u64, and u128. That is, Index supports neighbor searches on these binary code types.

Benchmark

timeperf_topk.rs offers the benchmark of top-K search for MIH and LinearSearch algorithms on binary code types u32, u64, and u128.

The following table shows the result of average search times in milliseconds per query, in the settings:

Result for u32

| Algorithm | N=10,000 | N=100,000 | N=1,000,000 | N=10,000,000 | | ------------ | -------: | --------: | ----------: | -----------: | | MIH (K=1) | 0.01 | 0.02 | 0.07 | 0.38 | | MIH (K=10) | 0.04 | 0.08 | 0.30 | 1.06 | | MIH (K=100) | 0.13 | 0.22 | 1.22 | 4.35 | | LinearSearch | 0.36 | 4.40 | 50.96 | 626.87 |

Result for u64

| Algorithm | N=10,000 | N=100,000 | N=1,000,000 | N=10,000,000 | | ------------ | -------: | --------: | ----------: | -----------: | | MIH (K=1) | 0.10 | 0.36 | 1.46 | 6.7 | | MIH (K=10) | 0.20 | 0.76 | 3.72 | 14.8 | | MIH (K=100) | 0.41 | 1.53 | 7.02 | 33.2 | | LinearSearch | 0.36 | 4.36 | 52.28 | 629.1 |

Result for u128

| Algorithm | N=10,000 | N=100,000 | N=1,000,000 | N=10,000,000 | | ------------ | -------: | --------: | ----------: | -----------: | | MIH (K=1) | 0.57 | 3.24 | 23.7 | 162 | | MIH (K=10) | 0.83 | 4.95 | 42.6 | 323 | | MIH (K=100) | 1.10 | 7.71 | 69.5 | 416 | | LinearSearch | 0.48 | 5.47 | 62.3 | 698 |

Licensing

This library is free software provided under MIT.