Build Status gad on crates.io Documentation License License

Generic Automatic Differentiation (GAD)

This library provides automatic differentiation by backward propagation (aka. "autograd") in Rust. It was designed to allow first-class user extensions (e.g. with new array types or new operators) and to support multiples modes of execution with minimal overhead.

The following modes of execution are currently supported for all library-defined operators: * first-order differentiation, * higher-order differentiation, * forward-only evaluation, and * dimension checking.

Design Principles

The core of this library consists of a tape-based implementation of automatic differentiation in reverse mode.

We have chosen to prioritize idiomatic Rust in order to make this library as re-usable as possible:

While this library is primarily motivated by machine learning applications, it is meant to cover other use cases of automatic differentiation in reverse mode. In the sections below, we show how a user may define new operators and add new modes of execution while retaining automatic differentiability.

Limitations

We believe that this state of affairs could be improved in the future using Rust macros. Alternatively, future extensions of the library could define a new category of differentiable values that contain an implicit RefCell reference to a common tape and provide (implicitly fallible, thread unsafe) operator traits for these values.

Quick Start

To compute gradients, we first build an expression using operations provided by a fresh tape g of type Graph1. Successive algebraic operations modify the internal state of g to track all relevant computations and enables future backward propagation passes.

We then call g.evaluate_gradients(..) to run a backward propagation algorithm from the desired starting point and using an initial gradient value direction.

Unless a one-time optimized variant g.evaluate_gradients_once(..) is used, backward propagation with g.evaluate_gradients(..) does not modify g. This allows successive (or concurrent) backward propagations to be run from different starting points or with different gradient values.

rust // A new tape supporting first-order differentials (aka gradients) let mut g = Graph1::new(); // Compute forward values. let a = g.variable(1f32); let b = g.variable(2f32); let c = g.mul(&a, &b)?; // Compute the derivatives of `c` relative to `a` and `b` let gradients = g.evaluate_gradients(c.gid()?, 1f32)?; // Read the `dc/da` component. assert_eq!(*gradients.get(a.gid()?).unwrap(), 2.0);

Because Graph1, the type of g, provides algebraic operations as methods, below we refer to such a type as an "algebra". GAD uses particular Rust traits to represent the set of operations supported by a given algebra.

Computations with Arrayfire

The default array operations of the library are currently based on Arrayfire, a portable array library supporting GPUs and JIT-compilation. rust use arrayfire as af; // A new tape supporting first-order differentials (aka gradients) let mut g = Graph1::new(); // Compute forward values using Arrayfire arrays let dims = af::Dim4::new(&[4, 3, 1, 1]); let a = g.variable(af::randu::<f32>(dims)); let b = g.variable(af::randu::<f32>(dims)); let c = g.mul(&a, &b)?; // Compute gradient of c let direction = af::constant(1f32, dims); let gradients = g.evaluate_gradients_once(c.gid()?, direction)?;

Using Generics for Forward Evaluation and Fast Dimension Checking

The algebra Graph1 used in the examples above is one choice amongst several "default" algebras offered by the library:

Users are encouraged to program formulas in a generic way so that any of the default algebras can be chosen.

The following example illustrates such a programming style in the case of array operations: ```rust use arrayfire as af;

fn get_value(g: &mut A) -> Result<>::Value> where A : AfAlgebra { let dims = af::Dim4::new(&[4, 3, 1, 1]); let a = g.variable(af::randu::(dims)); let b = g.variable(af::randu::(dims)); g.mul(&a, &b) }

// Direct evaluation. The result type is a primitive (non-differentiable) value. let mut g = Eval::default(); let c : af::Array = get_value(&mut g)?;

// Fast dimension-checking. The result type is a dimension. let mut g = Check::default(); let d : af::Dim4 = getvalue(&mut g)?; asserteq!(c.dims(), d); ```

Higher-Order Differentials

Higher-order differentials are computed using the algebra GraphN. In this case, gradients are values whose computations is also tracked.

```rust // A new tape supporting differentials of any order. let mut g = GraphN::new();

// Compute forward values using floats. let x = g.variable(1.0f32); let y = g.variable(0.4f32); // z = x * y^2 let z = { let h = g.mul(&x, &y)?; g.mul(&h, &y)? }; // Use short names for gradient ids. let (x, y, z) = (x.gid()?, y.gid()?, z.gid()?);

// Compute gradient. let dz = g.constant(1f32); let dzd = g.computegradients(z, dz)?; let dzdx = dzd.get(x).unwrap();

// Compute some 2nd-order differentials. let ddz = g.constant(1f32); let ddzdxd = g.computegradients(dzdx.gid()?, ddz)?; let ddzdxdy = ddzdxd.get(y).unwrap().data(); asserteq!(*ddz_dxdy, 0.8); // 2y

// Compute some 3rd-order differentials. let dddz = g.constant(1f32); let dddzdxdyd = g.computegradients(ddzdxd.get(y).unwrap().gid()?, dddz)?; let dddzdxdydy = dddzdxdyd.get(y).unwrap().data(); asserteq!(*dddz_dxdydy, 2.0); ```

Extending Automatic Differentiation

Operations and algebras

The default algebras Eval, Check, Graph1, GraphN are meant to provide interchangeable sets of operations in each of the default modes of operation (respectively, evaluation, dimension-checking, first-order differentiation, and higher-order differentiation).

Default operations are grouped into several traits named *Algebra and implemented by each of the four default algebras above.

The motivation for using several *Algebra traits is twofold:

The following example illustrates gradient computations over integers: rust let mut g = Graph1::new(); let a = g.variable(1i32); let b = g.variable(2i32); let c = g.sub(&a, &b)?; assert_eq!(*c.data(), -1); let gradients = g.evaluate_gradients_once(c.gid()?, 1)?; assert_eq!(*gradients.get(a.gid()?).unwrap(), 1); assert_eq!(*gradients.get(b.gid()?).unwrap(), -1);

User-defined operations

Users may define new differentiable operations by defining their own *Algebra trait and providing implementations to the default algebras Eval, Check, Graph1, GraphN.

In the following example, we define a new operation square over integers and af-arrays and add support for first-order differentials: ```rust use arrayfire as af;

pub trait UserAlgebra { fn square(&mut self, v: &Value) -> Result; }

impl UserAlgebra for Eval { fn square(&mut self, v: &i32) -> Result { Ok(v * v) } }

impl UserAlgebra> for Eval where T: af::HasAfEnum + af::ImplicitPromote { fn square(&mut self, v: &af::Array) -> Result> { Ok(v * v) } }

impl UserAlgebra> for Graph1 where Eval: CoreAlgebra + UserAlgebra + ArithAlgebra + LinkedAlgebra, D>, D: HasDims + Clone + 'static + Send + Sync, D::Dims: PartialEq + std::fmt::Debug + Clone + 'static + Send + Sync, { fn square(&mut self, v: &Value) -> Result> { let result = self.eval().square(v.data())?; let value = self.makenode(result, vec![v.input()], { let v = v.clone(); move |graph, store, gradient| { if let Some(id) = v.id() { let c = graph.link(&v); let grad1 = graph.mul(&gradient, c)?; let grad2 = graph.mul(c, &gradient)?; let grad = graph.add(&grad1, &grad2)?; store.addgradient(graph, id, &grad)?; } Ok(()) } }); Ok(value) } }

fn main() -> Result<()> { let mut g = Graph1::new(); let a = g.variable(3i32); let b = g.square(&a)?; asserteq!(*b.data(), 9); let gradients = g.evaluategradientsonce(b.gid()?, 1)?; asserteq!(*gradients.get(a.gid()?).unwrap(), 6); Ok(()) } ```

The implementation for GraphN would be identical to Graph1. We have omitted dimension-checking for simplicity. We refer readers to the test files of the library for a more complete example.

User-defined algebras

Users may define new "evaluation" algebras (similar to Eval) by implementing a subset of operation traits that includes CoreAlgebra<Data, Value=Data> for each supported Data types.

An evaluation-only algebra can be turned into algebras supporting differentiation (similar to Graph1 and GraphN) using the Graph construction provided by the library.

The following example illustrates how to define a new evaluation algebra SymEval then deduce its counterpart SymGraph1: ```rust /// A custom algebra for forward-only symbolic evaluation.

[derive(Clone, Default)]

struct SymEval;

/// Symbolic expressions of type T.

[derive(Debug, PartialEq)]

enum Exp_ { Num(T), Neg(Exp), Add(Exp, Exp), Mul(Exp, Exp), }

type Exp = Arc>;

impl CoreAlgebra> for SymEval { type Value = Exp; fn variable(&mut self, data: Exp) -> Self::Value { data } fn constant(&mut self, data: Exp) -> Self::Value { data } fn add(&mut self, v1: &Self::Value, v2: &Self::Value) -> Result { Ok(Arc::new(Exp_::Add(v1.clone(), v2.clone()))) } }

impl ArithAlgebra> for SymEval { fn neg(&mut self, v: &Exp) -> Exp { Arc::new(Exp::Neg(v.clone())) } fn sub(&mut self, v1: &Exp, v2: &Exp) -> Result> { let v2 = self.neg(v2); Ok(Arc::new(Exp::Add(v1.clone(), v2))) } fn mul(&mut self, v1: &Exp, v2: &Exp) -> Result> { Ok(Arc::new(Exp_::Mul(v1.clone(), v2.clone()))) } }

// No dimension checks. impl HasDims for Exp_ { type Dims = (); fn dims(&self) {} }

impl std::fmt::Display for Exp_ { // ... }

/// Apply graph module to Derive an algebra supporting gradients. type SymGraph1 = Graph>;

fn main() -> Result<()> { let mut g = SymGraph1::new(); let a = g.variable(Arc::new(Exp::Num("a"))); let b = g.variable(Arc::new(Exp::Num("b"))); let c = g.mul(&a, &b)?; let d = g.mul(&a, &c)?; asserteq!(format!("{}", d.data()), "aab"); let gradients = g.evaluategradientsonce(d.gid()?, Arc::new(Exp::Num("1")))?; asserteq!(format!("{}", gradients.get(a.gid()?).unwrap()), "(1ab+a1b)"); asserteq!(format!("{}", gradients.get(b.gid()?).unwrap()), "aa1"); Ok(()) } ```

Contributing

See the CONTRIBUTING file for how to help out.

License

This project is available under the terms of either the Apache 2.0 license or the MIT license.