CLAM: Clustered Learning of Approximate Manifolds (v0.21.4)

CLAM is a Rust/Python library for learning approximate manifolds from data. It is designed to be fast, memory-efficient, easy to use, and scalable for big data applications.

CLAM provides utilities for fast search (Cakes) and anomaly detection (Chaoda).

As of writing this document, the project is still in a pre-1.0 state. This means that the API is not yet stable and breaking changes may occur frequently.

Usage

CLAM is a library crate so you can add it to your crate using cargo add abd_clam@0.21.4.

Here is a simple example of how to use CLAM to perform nearest neighbors search:

```rust use symagen::random_data;

use abd_clam::{knn, rnn, Cakes, PartitionCriteria, VecDataset};

/// Euclidean distance function. /// /// This function is used to compute the distance between two points for the purposes /// of this demo. You can use your own distance function instead. The required /// signature is fn(T, T) -> U where T is the type of the points (must /// implement Send, Sync and Copy) and U is a Number type (e.g. f32) /// from the distances crate. fn euclidean(x: &[f32], y: &[f32]) -> f32 { x.iter() .zip(y.iter()) .map(|(a, b)| a - b) .map(|v| v * v) .sum::() .sqrt() }

// Some parameters for generating random data. let seed = 42; let (cardinality, dimensionality) = (1000, 10); let (minval, max_val) = (-1., 1.);

/// Generate some random data. You can use your own data here. let data: Vec> = randomdata::randomf32(cardinality, dimensionality, minval, maxval, seed);

// We will use the first point in data as our query, and we will perform // RNN search with a radius of 0.05 and KNN search for the 10 nearest neighbors. let query: Vec = data[0].clone(); let radius: f32 = 0.05; let k = 10;

// We need the contents of data to be &[f32] instead of Vec. We will rectify this // in CLAM by extending the trait bounds of some types in CLAM. let data: Vec<&[f32]> = data.iter().map(Vec::as_slice).collect::>();

let name = "demo".tostring(); // The name of the dataset. let ismetric_expensive = false; // We will assume that our distance function is cheap to compute.

// The metric function itself will be given to Cakes. let data = VecDataset::new(name, data, euclidean, ismetricexpensive);

// We will use the default partition criteria for this example. This will partition // the data until each Cluster contains a single unique point. let criteria = PartitionCriteria::default();

// The Cakes struct provides the functionality described in the CHESS paper. // This line performs a non-trivial amount of work. let model = Cakes::new(data, Some(seed), criteria);

// We will soon add the ability to save and load models, but for now we will // just use the model we just created.

// We can now perform RNN search on the model. let rnnresults: Vec<(usize, f32)> = model.rnnsearch(&query, radius, rnn::Algorithm::Clustered); assert!(!rnnresults.isempty());

// We can also perform KNN search on the model. let knnresults: Vec<(usize, f32)> = model.knnsearch(&query, k, knn::Algorithm::RepeatedRnn); assert!(knn_results.len() >= k);

// Both results are a Vec of 2-tuples where the first element is the index of the point // in the dataset and the second element is the distance from the query point. ```

License

References

Citation

TODO