Hierarchical Navigable Small World Graph for fast ANN search
Enable the serde1
feature to serialize and deserialize Hamming
and Euclidean
types.
```rust use hnsw::{Searcher, HNSW}; use space::{Hamming, Neighbor};
fn main() {
let mut searcher = Searcher::default();
let mut hnsw: HNSW
let features = [
0b0001, 0b0010, 0b0100, 0b1000, 0b0011, 0b0110, 0b1100, 0b1001,
];
// Insert all features. A searcher data structure is used to avoid performing
// memory allocations every insertion and search. Reuse searchers for speed.
for &feature in &features {
hnsw.insert(Hamming(feature), &mut searcher);
}
// Allocate an array to store nearest neighbors.
let mut neighbors = [Neighbor::invalid(); 8];
// Pass the whole neighbors array as a slice. It will attempt to fill the whole array
// with nearest neighbors from nearest to furthest.
hnsw.nearest(&Hamming(0b0001), 24, &mut searcher, &mut neighbors);
// Distance 1
neighbors[1..3].sort_unstable();
// Distance 2
neighbors[3..6].sort_unstable();
// Distance 3
neighbors[6..8].sort_unstable();
assert_eq!(
neighbors,
[
Neighbor {
index: 0,
distance: 0
},
Neighbor {
index: 4,
distance: 1
},
Neighbor {
index: 7,
distance: 1
},
Neighbor {
index: 1,
distance: 2
},
Neighbor {
index: 2,
distance: 2
},
Neighbor {
index: 3,
distance: 2
},
Neighbor {
index: 5,
distance: 3
},
Neighbor {
index: 6,
distance: 3
}
]
);
} ```
Please refer to the space
documentation for the trait and types regarding distance. It also contains special Bits128
- Bits4096
tuple structs that wrap an array of bytes and enable SIMD capability. Benchmarks provided use these SIMD impls.
An implementation is currently not provided for euclidean distance after a recent refactor. Hamming distance was more relevant at the time, and so that was prioritized. To implement euclidean distance, do something roughly like the following:
```rust struct Euclidean<'a>(&'a [f32]);
impl MetricPoint for Euclidean<'> {
fn distance(&self, rhs: &Self) -> u32 {
space::f32metric(
self.0
.iter()
.zip(rhs.0.iter())
.map(|(&a, &b)| (a - b).powi(2))
.sum::
Note that the above implementation may have some numerical error on high dimensionality. In that case use a Kahan sum instead.
Here is a recall graph that you can compare to its alternatives:
For more benchmarks and how to benchmark, see benchmarks.md
.
This is based on the paper "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs" by Yu. A. Malkov and D. A. Yashunin. This paper builds on the original paper for NSW. There are multiple papers written by the authors on NSW, which proceeded HNSW.
For more details about parameters and details of the implementation, see implementation.md
.
This is in no way a direct copy or reimplementation of the original implementation. This was made purely based on the paper without reference to the original headers. The paper is very well written and easy to understand, with some minor exceptions. Thank you to the authors for your valuble contribution.
Please visit the Rust CV Discord.