tract-linalg
linalg stands for "linear algebra". This is a misnamer. This crates contains
low-level, architecture dependant optimisations used by tract-core.
Functions
- MatMatMul: Extended matrixmatrix product:
- inspired by Gotoblass and BLIS micro kernel approach
- extended for convolution friendly addressing (fused img2col)
- fused output pipeline (min, max, and a few more simple, fast ops)
- f32
f32 -> f32 (à la sgemm)
- i8i8 -> i32 accumulator -> i32 storage
- i8i8 -> i32 accumulator -> i8 (with channel zeropoint and scale, and re-quantization pipeline)
f32 sigmoid and f32 tanh: at f32 precision, by a rationale function (no exponentiation)
byte-to-byte lookup table
Implementations
| | generic fallback | armv6, vfp | armv7 neon | armv8 simd | x64 FMA
|-------------------|--------------------|---------------|-------------------|-------------------|-----------------
| MatMatMul f32 | | 4x4 | 8x4 | 8x8 | 16x6
| MatMatMul i8->i8 | | | 8x4 | | 8x8
| MatMatMul i8->i32 | | | | | 8x8
| sigmoid f32 | | | 4n | 4n |
| tanh f32 | | | 4n | 4n |
| byte lookup | | | | |