nnx
)Command-line interface for inference using wonnx
ONNX defines a standardized format to exchange machine learning models. However, up to this point there is no easy way to perform one-off inference using such a model without resorting to Python. Installation of Python and the required libraries (e.g. TensorFlow and underlying GPU setup) can be cumbersome. Additionally specific code is always needed to transfer inputs (images, text, etc.) in and out of the formats required by the model (i.e. image classification models want their images as fixed-size tensors with the pixel values normalized to specific values, et cetera).
This project provides a very simple all-in-one binary command line tool that can be used to perform inference using ONNX models on the GPU. Thanks to wonnx, inference is performed on the GPU.
NNX tries to make educated guesses about how to transform input and output for a model. These guesses are a default - i.e. it should always be possible to override them. The goal is to reduce the amount of configuration required to be able to run a model. Currently the following heuristics are applied:
```sh $ nnx infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --labels ./data/models/squeeze-labels.txt --probabilities n01608432 kite: 21.820244 n02051845 pelican: 21.112095 n02018795 bustard: 20.359694 n01622779 great grey owl, great gray owl, Strix nebulosa: 20.176003 n04417672 thatch, thatched roof: 19.638676 n02028035 redshank, Tringa totanus: 19.606218 n02011460 bittern: 18.90648 n02033041 dowitcher: 18.708323 n01829413 hornbill: 18.595457 n01616318 vulture: 17.508785
$ nnx infer ./data/models/opt-mnist.onnx -i Input3=./data/images/7.jpg [-1.2942507, 0.5192305, 8.655695, 9.474595, -13.768464, -5.8907413, -23.467274, 28.252314, -6.7598896, 3.9513395]
$ nnx infer ./data/models/opt-mnist.onnx -i Input3=./data/images/7.jpg --labels ./data/models/mnist-labels.txt --top=1 Seven
$ nnx info ./data/models/opt-mnist.onnx
+------------------+---------------------------------------------------+
| Model version | 1 |
+------------------+---------------------------------------------------+
| IR version | 3 |
+------------------+---------------------------------------------------+
| Producer name | CNTK |
+------------------+---------------------------------------------------+
| Producer version | 2.5.1 |
+------------------+---------------------------------------------------+
| Opsets | 8 |
+------------------+---------------------------------------------------+
| Inputs | +--------+-------------+-----------+------+ |
| | | Name | Description | Shape | Type | |
| | +--------+-------------+-----------+------+ |
| | | Input3 | | 1x1x28x28 | f32 | |
| | +--------+-------------+-----------+------+ |
+------------------+---------------------------------------------------+
| Outputs | +------------------+-------------+-------+------+ |
| | | Name | Description | Shape | Type | |
| | +------------------+-------------+-------+------+ |
| | | Plus214Output0 | | 1x10 | f32 | |
| | +------------------+-------------+-------+------+ |
+------------------+---------------------------------------------------+
| Ops used | +---------+---------------------+ |
| | | Op | Attributes | |
| | +---------+---------------------+ |
| | | Reshape | | |
| | +---------+---------------------+ |
| | | Gemm | transA=0 | |
| | | | transB=0 | |
| | | | beta=1 | |
| | | | alpha=1 | |
| | +---------+---------------------+ |
| | | Relu | | |
| | +---------+---------------------+ |
| | | Conv | autopad=SAMEUPPER | |
| | | | group=1 | |
| | | | strides=
nnx
with cargo run --release --
to run development versionRUST_LOG=wonnx-cli=info
to see useful logging from the CLI tool, RUST_LOG=wonnx=info
to see logging from WONNX.tract
The nnx utility can use tract as CPU-based backend for ONNX inference. In order to use
this, nnx needs to be compiled with the cpu
feature enabled. You can then specify one of the following arguments:
* --backend cpu
to select the cpu backend
* --fallback
to select the cpu backend when the gpu backend cannot be used (e.g. because of an unsupported operation type)
* --compare
to run inference on both CPU and GPU backends and compare the output
* --benchmark
to run the specified inference a hundred times, then report the performance
* --compare --benchmark
to run inference on both CPU and GPU a hundred times each, and compare the performance
A benchmarking example (the below result was obtained on an Apple M1 Max system):
```sh
$ cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark OK (gpu=572ms, cpu=1384ms, 2.42x) ```
pip install tensorflow onnx tf2onnx
Create a very simple model for the MNIST digits:
```python from tensorflow.keras.datasets import mnist (trainimages, trainlabels), (testimages, testlabels) = mnist.load_data()
from tensorflow import keras from tensorflow.keras import layers
trainimagesinput = train_images.astype("float32") / 255
model = keras.Sequential([ layers.Reshape((28*28,), input_shape=(28,28)), layers.Dense(512, activation = 'relu'), layers.Dropout(rate=0.01), layers.Dense(10, activation = 'softmax') ])
model.compile(optimizer="rmsprop", loss="sparsecategoricalcrossentropy", metrics=["accuracy"])
model.fit(trainimagesinput, trainlabels, epochs=20, batchsize=1024) ```
```python import tf2onnx import tensorflow as tf import onnx inputsignature = [tf.TensorSpec([1,28,28], tf.float32, name='input')] onnxmodel, _ = tf2onnx.convert.fromkeras(model, inputsignature, opset=13)
from onnx import helper, shapeinference inferredmodel = shapeinference.infershapes(onnx_model)
onnx.save(onnxmodel, "tymnist.onnx") onnx.save(inferredmodel, "tymnist-inferred.onnx") ```
sh
nnx ./tymnist-inferred.onnx infer -i input=./data/mnist-7.png --labels ./data/models/mnist-labels.txt
pip install numpy pillow matplotlib
):python
import PIL
import numpy
import matplotlib.pyplot as plt
m5 = PIL.Image.open("data/mnist-7.png").resize((28,28), PIL.Image.ANTIALIAS)
nm5 = numpy.array(m5).reshape((1,28,28))
model.predict(nm5)