CLAM: Clustered Learning of Approximate Manifolds (v0.11.1)

CLAM is a Rust/Python library for learning approximate manifolds from data. It is designed to be fast, memory-efficient, easy to use, and scalable for big data applications.

CLAM provides utilities for fast search (CAKES) and anomaly detection (CHAODA).

Installation

Python

```shell

python3 -m pip install abd_clam ```

Rust

```shell

cargo add abd_clam ```

Usage

Python

```python from abdclam.search import CAKES from abdclam.utils import synthetic_datasets

Get the data.

data, _ = synthetic_datasets.bullseye()

data is a numpy.ndarray in this case but it could just as easily be a numpy.memmap if your data do fit in RAM.

We used numpy memmaps for the research, though they impose file-IO costs.

model = CAKES(data, 'euclidean')

The Search class provides the functionality described in our CHESS paper.

model.build(max_depth=50)

Build the search tree to depth of 50.

This method can be called again with a higher depth, if needed.

query, radius = data[0], 0.5

rnnresults = model.rnnsearch(query, radius)

This is how we perform ranged nearest neighbors search with radius 0.5 around the query.

The results are returned as a dictionary whose keys are indices into the data array and whose values are the distance to the query.

knnresults = model.knnsearch(query, 10)

This is how we perform k-nearest neighbors search for the 10 nearest neighbors of query.

TODO: Provide snippets for using CHAODA

```

Contributing

Pull requests and bug reports are welcome. For major changes, please open an issue to discuss what you would like to change.

License

MIT

Citation

TODO