A data embedding tool and related data analysis or clustering

The crate will provide:

  1. Some variations on data embedding tools from t-Sne (2008) to Umap(2018). Our implementation is a mix of the various embedding algorithms recently published and mentioned in References.

  2. An implementation of the Mapper algorithm using the C++ Ripser module from U. Bauer

  3. Some by-products :

The crate is in a preliminary state

Currently only the approximated SVD and a first version of the embedding (with possible hierarchical inittialization) are implemented. But the mnist examples shows how to run the embedding, even in this (preliminary) state.

Building

The crate provides 2 features to choose between openblas-static and intel-mkl-static.
So --features "openblas-static" or --features "openblas-static" or must be passed to cargo to compile. Alternatively define the default in Cargo.toml.

Results

These are preliminary results. Timings are given for a 8-core i7 @2.3 Ghz laptop.

Embedder examples

  1. MNIST digits database Cf mnist-digits

It consists in 70000 images of handwritten digits of 784 pixels

mnist

mnist

It took 22s of which 9s were spent in the ann construction.

  1. MNIST fashion database Cfmnist-fashion

It conssits in 70000 images of clothes.

Randomized SVD

The randomized SVD is based on the paper of Halko-Tropp

Mapper

Installation

compile with :

References

License

Licensed under either of

  1. Apache License, Version 2.0, LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0

  2. MIT license LICENSE-MIT or http://opensource.org/licenses/MIT

at your option.

This software was written on my own while working at CEA