WONNX

GitHub Workflow Status docs.rs Crates.io (latest) Crates.io

Wonnx is a GPU-accelerated ONNX inference run-time written 100% in Rust, ready for the web.

Supported Platforms (enabled by wgpu)

API | Windows | Linux & Android | macOS & iOS | ----- | ----------------------------- | ------------------ | ------------------ | Vulkan | ✅ | ✅ | | Metal | | | ✅ | DX12 | ✅ (W10 only) | | | DX11 | :construction: | | | GLES3 | | :ok: | |

:whitecheckmark: = First Class Support — :ok: = Best Effort Support — :construction: = Unsupported, but support in progress

Getting started

From the command line

Ensure your system supports either Vulkan, Metal or DX12 for access to the GPU. Then either download a binary release, or install Rust and run cargo install --git https://github.com/webonnx/wonnx.git wonnx-cli to install the CLI.

The CLI tool (nnx) provides a convenient interface for tinkering with models (see the README for more information):

bash nnx info ./data/models/opt-squeeze.onnx nnx infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --labels ./data/models/squeeze-labels.txt --top 3

From Rust

Add the wonnx crate as dependency (cargo add wonnx if you have cargo-add). Then, see the examples for usage examples, or browse the API docs.

From Python

bash pip install wonnx

And then, to use:

python from wonnx import PySession session = PySession.from_path( "../data/models/single_relu.onnx" ) inputs = {"x": [-1.0, 2.0]} assert session.run(inputs) == {"y": [0.0, 2.0]}

Then run python3 with the above Python code!

For more details on the Python package including build instructions, see wonnx-py.

In the browser, using WebGPU + WebAssembly

bash npm install @webonnx/wonnx-wasm

And then, on the client side:

````js import init, { Session, Input } from "@webonnx/wonnx-wasm";

// Check for WebGPU availability first: if(navigator.gpu) { .. } await init(); const session = await Session.fromBytes(modelBytes /* Uint8Array containing the ONNX file */); const input = new Input(); input.insert("x", [13.0, -37.0]); const result = await session.run(input); // This will be an object where the keys are the names of the model outputs and the values are arrays of numbers. session.free(); input.free(); ````

The package @webonnx/wonnx-wasm provides an interface to WONNX, which is included as WebAssembly module and will use the browser's WebGPU implementation. See wonnx-wasm-example for a more complete usage example involving a bundler.

For more details on the JS/WASM package including build instructions, see wonnx-wasm.

For development

To work on wonnx itself, follow the following steps:

```bash git clone https://github.com/webonnx/wonnx.git

With Git LFS

git lfs install

With Download link

wget https://wonnx.s3.eu-west-3.amazonaws.com/data.zip rm -rf data unzip data.zip ```

Ensure Git LFS is initialized and has downloaded the model files (in wonnx/examples/data/models). Then, you're all set!

You can run one of the included examples through cargo:

bash cargo run --example squeeze --release

Running other models

```bash

pip install -U pip && pip install onnx-simplifier

python -m onnxsim mnist-8.onnx opt-mnist.onnx ```

bash cargo run --example mnist --release

Examples are available in the examples folder.

Tested models

GPU selection

Except when running in WebAssembly, you may set the following environment variables to influence GPU selection by WGPU:

Contribution: On implementing a new Operator

Contribution are very much welcomed even without large experience in DL, WGSL, or Rust. I hope that, this project can be a sandbox for all of us to learn more about those technologies beyond this project initial scope.

To implement an operator all you have to do is: 1. Add a new matching pattern in compiler.rs 2. Retrieve its attributes values using the get_attribute function: Rust let alpha = get_attribute("alpha", Some(1.0), node); // or without default value let alpha = get_attribute::<f32>("alpha", None, node); 3. Add any variable you want to use in the WGSL shader using context. 4. Write a new WGSL template in the templates folder.

Available types are in structs.wgsl but you can also generate new ones within your templates. 5. Respect the binding layout that each entry is incremented by 1 starting from 0, with input first and output last. If the number of binding is above 4. Increment the binding group. You can change the input within sequencer.rs 6. Write the logic.

There is default variables in the context: - {{ i_lens[0] }}: the length of the input 0. This also work for output: {{ o_lens[0] }} and other input {{ i_lens[1] }} - {{ i_shape[0] }}: the array of dimensions of input 0. To get the first dimension of the array, just use: {{ i_shape[0][0] }} - {{ i_chunks[0] }}: the size of the chunks of each dimensions of input 0. By default, each variable is represented as a long array of values where to get to specific values you have to move by chunks. Those chunks are represented within this variable. To get the size of the chunks of the first dimensions use: {{ i_chunks[0][0] }}. - {{ op_type }} the op type as some op_type like activation are using the same template.

  1. Test it using the utils function and place it in the tests folder. The test can look as follows: ```Rust

    [test]

fn testmatmulsquare_matrix() { // USER INPUT

let n = 16;
let mut input_data = HashMap::new();

let data_a = ndarray::Array2::eye(n);
let mut data_b = ndarray::Array2::<f32>::zeros((n, n));
data_b[[0, 0]] = 0.2;
data_b[[0, 1]] = 0.5;

let sum = data_a.dot(&data_b);

input_data.insert("A".to_string(), data_a.as_slice().unwrap());
input_data.insert("B".to_string(), data_b.as_slice().unwrap());

let n = n as i64;
let model = model(graph(
    vec![tensor("A", &[n, n]), tensor("B", &[n, n])],
    vec![tensor("C", &[n, n])],
    vec![],
    vec![],
    vec![node(vec!["A", "B"], vec!["C"], "MatMul", "MatMul", vec![])],
));

let session =
    pollster::block_on(wonnx::Session::from_model(model)).expect("Session did not create");

let result = pollster::block_on(session.run(input_data)).unwrap();

// Note: it is better to use a method that compares floats with a tolerance to account for differences
// between implementations; see `wonnx/tests/common/mod.rs` for an example.
assert_eq!((&result["C"]).try_into().unwrap(),sum.as_slice().unwrap());

} ```

Check out tera documentation for other templating operation: https://tera.netlify.app/docs/

  1. If at any point you want to do optimisation of several node you can do it within sequencer.rs.

Supported Operators (ref ONNX IR)

|Operator|Since version|Implemented| |-|-|-| |Abs|13, 6, 1|✅| |Acos|7|✅| |Acosh|9| |Add|14, 13, 7, 6, 1|✅| |And|7, 1|✅| |ArgMax|13, 12, 11, 1| |ArgMin|13, 12, 11, 1| |Asin|7|✅| |Asinh|9| |Atan|7|✅| |Atanh|9| |AveragePool|11, 10, 7, 1|✅| |BatchNormalization|15, 14, 9, 7, 6, 1|✅| |BitShift|11| |Cast|13, 9, 6, 1|✅| |Ceil|13, 6, 1|✅| |Clip|13, 12, 11, 6, 1|✅| |Compress|11, 9| |Concat|13, 11, 4, 1|✅| |ConcatFromSequence|11| |Constant|13, 12, 11, 9, 1| |ConstantOfShape|9| |Conv|11, 1|✅| |ConvInteger|10| |ConvTranspose|11, 1| |Cos|7|✅| |Cosh|9|✅| |CumSum|14, 11| |DepthToSpace|13, 11, 1| |DequantizeLinear|13, 10| |Det|11| |Div|14, 13, 7, 6, 1|✅| |Dropout|13, 12, 10, 7, 6, 1|✅| |Einsum|12| |Elu|6, 1|✅| |Equal|13, 11, 7, 1|✅| |Erf|13, 9| |Exp|13, 6, 1|✅| |Expand|13, 8| |EyeLike|9| |Flatten|13, 11, 9, 1|✅| |Floor|13, 6, 1|✅| |GRU|14, 7, 3, 1| |Gather|13, 11, 1|✅ (axis=0)| |GatherElements|13, 11| |GatherND|13, 12, 11| |Gemm|13, 11, 9, 7, 6, 1|✅*| |GlobalAveragePool|1|✅| |GlobalLpPool|2, 1| |GlobalMaxPool|1| |Greater|13, 9, 7, 1|✅| |GridSample|16| |HardSigmoid|6, 1| |Hardmax|13, 11, 1| |Identity|16, 14, 13, 1|✅| |If|16, 13, 11, 1| |InstanceNormalization|6, 1| |IsInf|10| |IsNaN|13, 9| |LRN|13, 1| |LSTM|14, 7, 1| |LeakyRelu|6, 1|✅| |Less|13, 9, 7, 1|✅| |Log|13, 6, 1|✅| |Loop|16, 13, 11, 1| |LpNormalization|1| |LpPool|11, 2, 1| |MatMul|13, 9, 1|✅| |MatMulInteger|10| |Max|13, 12, 8, 6, 1| |MaxPool|12, 11, 10, 8, 1|✅| |MaxRoiPool|1| |MaxUnpool|11, 9| |Mean|13, 8, 6, 1| |Min|13, 12, 8, 6, 1|✅| |Mod|13, 10|✅| |Mul|14, 13, 7, 6, 1|✅| |Multinomial|7| |Neg|13, 6, 1| |NonMaxSuppression|11, 10| |NonZero|13, 9| |Not|1| |OneHot|11, 9|✅ (axis=-1)| |Optional|15| |OptionalGetElement|15| |OptionalHasElement|15| |Or|7, 1|✅| |PRelu|9, 7, 6, 1|✅| |Pad|13, 11, 2, 1|✅ (mode=constant, pads>=0)| |Pow|15, 13, 12, 7, 1|✅ (broadcast=0 and data type is f32)| |QLinearConv|10| |QLinearMatMul|10| |QuantizeLinear|13, 10| |RNN|14, 7, 1| |RandomNormal|1| |RandomNormalLike|1| |RandomUniform|1| |RandomUniformLike|1| |Reciprocal|13, 6, 1|✅| |ReduceL1|13, 11, 1|✅| |ReduceL2|13, 11, 1|✅| |ReduceLogSum|13, 11, 1|✅| |ReduceLogSumExp|13, 11, 1|✅| |ReduceMax|13, 12, 11, 1|✅| |ReduceMean|13, 11, 1|✅| |ReduceMin|13, 12, 11, 1|✅| |ReduceProd|13, 11, 1|✅| |ReduceSum|13, 11, 1|✅| |ReduceSumSquare|13, 11, 1|✅| |Relu|14, 13, 6, 1|✅| |Reshape|14, 13, 5, 1|✅| |Resize|13, 11, 10|✅| |ReverseSequence|10| |RoiAlign|16, 10| |Round|11| |Scan|11, 9, 8| |Scatter (deprecated)|11, 9| |ScatterElements|16, 13, 11| |ScatterND|16, 13, 11| |Selu|6, 1| |SequenceAt|11| |SequenceConstruct|11| |SequenceEmpty|11| |SequenceErase|11| |SequenceInsert|11| |SequenceLength|11| |Shape|15, 13, 1| |Shrink|9| |Sigmoid|13, 6, 1|✅| |Sign|13, 9| |Sin|7|✅| |Sinh|9|✅| |Size|13, 1| |Slice|13, 11, 10, 1| |Softplus|1|✅| |Softsign|1|✅| |SpaceToDepth|13, 1| |Split|13, 11, 2, 1| |SplitToSequence|11| |Sqrt|13, 6, 1|✅| |Squeeze|13, 11, 1|✅| |StringNormalizer|10| |Sub|14, 13, 7, 6, 1|✅| |Sum|13, 8, 6, 1| |Tan|7|✅| |Tanh|13, 6, 1|✅| |TfIdfVectorizer|9| |ThresholdedRelu|10| |Tile|13, 6, 1| |TopK|11, 10, 1| |Transpose|13, 1|✅| |Trilu|14| |Unique|11| |Unsqueeze|13, 11, 1|✅| |Upsample (deprecated)|10, 9, 7| |Where|16, 9| |Xor|7, 1| |Function|Since version| |Bernoulli|15| |CastLike|15| |Celu|12|✅| |DynamicQuantizeLinear|11| |GreaterOrEqual|12|✅| |HardSwish|14| |LessOrEqual|12|✅| |LogSoftmax|13, 11, 1| |MeanVarianceNormalization|13, 9| |NegativeLogLikelihoodLoss|13, 12| |Range|11| |Softmax|13, 11, 1|✅ | |SoftmaxCrossEntropyLoss|13, 12|

Known limitations