This is a rust
implementation of Machine Learning (ML) methods for confident
prediction (e.g., Conformal Predictors) and related ones introduced in the book
Algorithmic Learning in a Random World (ALRW).
These are the main goals of this library. The fact that something appears here does not imply that it has already been fulfilled.
Include the following in Cargo.toml:
[dependencies]
random-world = "0.2.0"
Using a deterministic (i.e., non smooth) Conformal Predictor with k-NN
nonconformity measure (k=2
) and significance level epsilon=0.3
.
The prediction region will contain the correct label with probability
1-epsilon
.
```rust
extern crate ndarray; extern crate random_world;
use randomworld::cp::*; use randomworld::ncm::*;
// Create a k-NN nonconformity measure (k=2) let ncm = KNN::new(2); // Create a Conformal Predictor with the chosen nonconformity // measure and significance level 0.3. let mut cp = CP::new(ncm, Some(0.3));
// Create a dataset let traininputs = array![[0., 0.], [1., 0.], [0., 1.], [1., 1.], [2., 2.], [1., 2.]]; let traintargets = array![0, 0, 0, 1, 1, 1]; let test_inputs = array![[2., 1.], [2., 2.]];
// Train and predict cp.train(&traininputs.view(), &traintargets.view()) .expect("Failed prediction"); let preds = cp.predict(&test_inputs.view()) .expect("Failed to predict"); assert!(preds == array![[false, true], [false, true]]); ```
Please, read the docs for more examples.
random-world provides standalone binaries for the main functionalities.
bin/cp
runs CP on a training set, uses it to predict a test set;
each dataset should be contained in a CSV file with rows:
label, x1, x2, ...
where label
is a label id (whose count needs to start from 0), and x1, x2, ...
are the values forming a feature vector.
Results are returned in a CSV file with rows:
p1, p2, ...
where each value is either a prediction (true/false) or a p-value (float in [0,1]), depending on the chosen output; each row contains $L$ values, each one corresponding to one label.
Example:
$ ./cp knn -k 1 predictions.csv train_data.csv test_data.csv
Runs CP with nonconformity measure k-NN (k=1) on train_data.csv
,
predicts test_data.csv
, and stores the output into
predictions.csv
.
The default output are p-values; to output actual predictions, specify
a significance level with --epsilon
.
To run CP in on-line mode on a dataset (i.e., predict one object
per time and then append it to the training examples), only specify
the training file:
$ ./cp knn -k 1 predictions.csv train_data.csv
More options are documented in the help:
$ ./cp -h
Methods: - [x] Deterministic and smoothed Conformal Predictors (aka, transductive CP) - [ ] Deterministic and smoothed Inductive Conformal Predictors (ICP) - [x] Plug-in martingales for exchangeability testing - [ ] Venn Predictors
Nonconformity measures: - [x] k-NN - [ ] KDE - [ ] Generic wrapper around existing ML scorers (e.g., rusty-machine)
Bindings: - [ ] Python bindings
Binaries: - [x] CP (both batch prediction and on-line) - [x] Martingales