CLAM is a Rust/Python library for learning approximate manifolds from data. It is designed to be fast, memory-efficient, easy to use, and scalable for big data applications.
CLAM provides utilities for fast search (CAKES) and anomaly detection (CHAODA).
As of writing this document, the project is still in a pre-1.0 state. This means that the API is not yet stable and breaking changes may occur frequently.
CLAM is a library crate so you can add it to your crate using cargo add abd_clam@0.17.0
.
Here is a simple example of how to use CLAM to perform nearest neighbors search:
```rust use symagen::random_data;
use abd_clam::{ cakes::{KnnAlgorithm, RnnAlgorithm, CAKES}, cluster::PartitionCriteria, dataset::VecVec, };
fn euclidean(x: &[f32], y: &[f32]) -> f32 {
x.iter()
.zip(y.iter())
.map(|(a, b)| (a - b).powi(2))
.sum::
// Get the data and queries. We will generate some random data for this demo. let seed = 42; let (cardinality, dimensionality) = (1000, 10); let (minval, max_val) = (-1., 1.);
let data = randomdata::randomf32(cardinality, dimensionality, minval, maxval, seed);
let data = data.iter().map(|v| v.as_slice()).collect::
let dataset = VecVec::new(data.clone(), euclidean, "demo".tostring(), false); let criteria = PartitionCriteria::new(true).withmin_cardinality(1); let model = CAKES::new(dataset, Some(seed), criteria);
// The CAKES struct provides the functionality described in the CHESS paper.
let (query, radius, k) = (&data[0], 0.05, 10);
let rnnresults: Vec<(usize, f32)> = model.rnnsearch(query, radius, RnnAlgorithm::Clustered); assert!(!rnnresults.isempty()); // This is how we perform ranged nearest neighbors search with radius 0.05 // around the query.
let knnresults: Vec<(usize, f32)> = model.knnsearch(query, 10, KnnAlgorithm::RepeatedRnn); assert!(knn_results.len() >= k); // This is how we perform k-nearest neighbors search for the 10 nearest // neighbors of query.
// Both results are a Vec of 2-tuples where each tuple is the index and // distance to points in the data. ```
TODO