autograd

Differentiable operations and tensors backed by ndarray.

Motivation

Machine learning is one of the field where Rust lagging behind other languages. The aim of this crate is to show that Rust has the capability to implement efficient and full-featured dataflow graph naturally. Moreover, the core of this crate is quite small compared to others (due to being implemented in pure Rust and ndarray), therefore it might be reasonable for those who are not familiar with how this kind of library works.

Installation

toml [dependencies] autograd = { version = "1.1.0", features = ["mkl"] } mkl feature is recommended to speedup linalg operations using Intel MKL.

rustc version

Tested with rustc 1.38 ..= 1.42

Features

Lazy, lightweight tensor evaluation

Computation graphs are created on the fly (a.k.a. define-by-run), but are not evaluated until eval is called. This mechanism balances better performance and flexibility. ```rust use autograd as ag;

ag::with(|g: &mut ag::Graph<_>| { let a: ag::Tensor = g.ones(&[60]); let b: ag::Tensor = g.ones(&[24]); let c: ag::Tensor = g.reshape(a, &[3, 4, 5]); let d: ag::Tensor = g.reshape(b, &[4, 3, 2]); let e: ag::Tensor = g.tensordot(c, d, &[1, 0], &[0, 1]); e.eval(&[]); // Getting ndarray::Array here. }); ```

Reverse-mode automatic differentiation

There are a lot of built-in operations that support higher-order derivatives, and you can also define your own differentiable ops with ndarrays easily.

Here we are just computing partial derivatives of z = 2x^2 + 3y + 1. ```rust ag::with(|g: &mut ag::Graph<_>| { let x = g.placeholder(&[]); let y = g.placeholder(&[]); let z = 2.xx + 3.*y + 1.;

// dz/dy
let gy = &g.grad(&[z], &[y])[0];
println!("{:?}", gy.eval(&[]));   // => Ok(3.)

// dz/dx (requires to fill the placeholder `x`)
let gx = &g.grad(&[z], &[x])[0];
let feed = ag::ndarray::arr0(2.);
println!("{:?}", gx.eval(&[x.given(feed.view())]));  // => Ok(8.)

// ddz/dx (differentiates `z` again)
let ggx = &g.grad(&[gx], &[x])[0];
println!("{:?}", ggx.eval(&[]));  // => Ok(4.)

}); ```

### Neural networks This crate has various low-level features inspired by tensorflow/theano to train neural networks. Since computation graphs require only bare minimum of heap allocations, the overhead is small, even for complex networks. ```rust // This is a softmax regression for MNIST digits classification with Adam. // This achieves 0.918 test accuracy after 3 epochs (0.11 sec/epoch on 2.7GHz Intel Core i5). use autograd::{self as ag, Graph, optimizers::adam, ndarray_ext as arr, tensor::Variable};

let rng = ag::ndarrayext::ArrayRng::::default(); let warr = arr::intoshared(rng.glorotuniform(&[28 * 28, 10])); let barr = arr::intoshared(arr::zeros(&[1, 10])); let adamstate = adam::AdamState::new(&[&warr, &b_arr]);

let max_epoch = 3;

for epoch in 0..maxepoch { ag::with(|g| { let w = g.variable(warr.clone()); let b = g.variable(barr.clone()); let x = g.placeholder(&[-1, 28*28]); let y = g.placeholder(&[-1]); let z = g.matmul(x, w) + b; let meanloss = g.reducemean(g.sparsesoftmaxcrossentropy(z, &y), &[0], false); let grads = &g.grad(&[&meanloss], &[w, b]); let updateops: &[ag::Tensor] = &adam::Adam::default().computeupdates(&[w, b], grads, &adamstate, g);

    // let batch_size = 200isize;
    // let num_samples = x_train.shape()[0];
    // let num_batches = num_samples / batch_size as usize;
    // for i in get_permutation(num_batches) {
    //     let i = i as isize * batch_size;
    //     let x_batch = x_train.slice(s![i..i + batch_size, ..]).into_dyn();
    //     let y_batch = y_train.slice(s![i..i + batch_size, ..]).into_dyn();
    //     g.eval(update_ops, &[x.given(x_batch), y.given(y_batch)]);
    // }
});

} ```

ConvNet, LSTM example can be found in examples

### Hooks You can register hooks on ag::Tensor objects for debugging. ```rust use autograd as ag;

ag::with(|g| { let a: ag::Tensor = g.zeros(&[4, 2]).show(); let b: ag::Tensor = g.ones(&[2, 3]).showshape(); let c = g.matmul(a, b).showwith("MatMul:");

 c.eval(&[]);
 // [[0.0, 0.0],
 // [0.0, 0.0],
 // [0.0, 0.0],
 // [0.0, 0.0]] shape=[4, 2], strides=[2, 1], layout=C (0x1)
 //
 // [2, 3]
 //
 // MatMul:
 //  [[0.0, 0.0, 0.0],
 //  [0.0, 0.0, 0.0],
 //  [0.0, 0.0, 0.0],
 //  [0.0, 0.0, 0.0]] shape=[4, 3], strides=[3, 1], layout=C (0x1), dynamic ndim=2

}); ```

For more, see documentation or examples