A data embedding tool and related data analysis or clustering

The crate will provide:

  1. Some variations on data embedding tools from t-Sne (2008) to Umap(2018). Our implementation is a mix of the various embedding algorithms recently published and mentioned in References.

  2. An implementation of the Mapper algorithm using the C++ Ripser module from U. Bauer

  3. Some by-products :

The crate is in a preliminary state

Currently only the approximated SVD and a first version of the embedding are implemented. But the mnist example shows how to run the embedding, even in this (preliminary) state.

Results

These are preliminary results. Timings are given for a 4-core i7-2.7 Ghz laptop.

Embedder

  1. MNIST digits database Cf mnist-digits

It consists in 70000 images of handwritten digits of 784 pixels

mnist

mnist

It took 60s of which 18s were spent in the ann construction.

Randomized SVD

The randomized SVD is based on the paper of Halko-Tropp

Mapper

Docs

The documentation uses Katex (see the katex_doc crate) to render some formulas. To build the doc with display of equations, set in your environment :

RUSTDOCFLAGS=--html-in-header katex-header.html

and run cargo rustdoc -- --html-in-header katex.html

References

License

Licensed under either of

  1. Apache License, Version 2.0, LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0

  2. MIT license LICENSE-MIT or http://opensource.org/licenses/MIT

at your option.

This software was written on my own while working at CEA