USearch

C++11 Single Header Vector Search
Compact, yet Powerful


Discord     LinkedIn     Twitter     Blog     GitHub

Usage

There are two usage patters:

C++

To use in a C++ project simply copy the include/usearch/usearch.hpp header into your project. Alternatively fetch it with CMake:

cmake FetchContent_Declare(usearch GIT_REPOSITORY https://github.com/unum-cloud/usearch.git) FetchContent_MakeAvailable(usearch)

The simple usage example would require including the unum::usearch namespace and choosing the right "distance" function. That can be one of the following templates:

That list is easily extendible, and can include similarity measures for vectors that have a different number of elements/dimensions. The minimal example would be.

```c++ using namespace unum::usearch;

index_gt> index; float vec[3] = {0.1, 0.3, 0.2};

index.reserve(10); index.add(/* label: / 42, / vector: / {&vec, 3}); index.search( / query: / {&vec, 3}, / top / 5 / results /, / with callback: */ { });

index.save("index.usearch"); // Serializing to disk index.load("index.usearch"); // Reconstructing from disk index.view("index.usearch"); // Memory-mapping from disk ```

The add is thread-safe for concurrent index construction.

Python

Python bindings are implemented with pybind/pybind11. Assuming the presence of Global Interpreter Lock in Python, on large insertions we spawn threads in the C++ layer.

```sh $ pip install usearch

import numpy as np import usearch

index = usearch.Index( dim=256, # Define the number of dimensions in input vectors metric='cos', # Choose the "metric" or "distance", default = 'ip', optional dtype='f16', # Quantize to 'f16' or 'i8q100' if needed, default = 'f32', optional connectivity=16, # How frequent should the connections in the graph be, optional expansionadd=128, # Control the recall of indexing, optional expansionsearch=64, # Control the quality of search, optional )

n = 100 labels = np.array(range(n), dtype=np.longlong) vectors = np.random.uniform(0, 0.3, (n, index.ndim)).astype(np.float32)

You can avoid copying the data

Handy when build 1B+ indexes of memory-mapped files

index.add(labels, vectors, copy=True) assert len(index) == n

You can search a batch at once

matches, distances, counts = index.search(vectors, 10) ```

Features

Bring your Threads

Performance

TODO