Efficient argmin & argmax (in 1 function) with SIMD (SSE, AVX(2), AVX512, NEON) for
f16
,f32
,f64
,i16
,i32
,i64
,u16
,u32
,u64
onndarray::ArrayView1
🚀 The function is generic over the type of the array, so it can be used on an ndarray::ArrayView1<T>
where T
can be f16
*, f32
, f64
, i16
, i32
, i64
, u16
, u32
, u64
.
âš¡ Runtime CPU feature detection is used to select the most efficient implementation for the current CPU. This means that the same binary can be used on different CPUs without recompilation.
👀 The SIMD implementation contains no if checks, ensuring that the runtime of the function is independent of the input data its order (best-case = worst-case = average-case).
🪄 Efficient support for f16 and uints: through (bijective aka symmetric) bitwise operations, f16 (optional) and uints are converted to ordered integers, allowing to use integer SIMD instructions.
*for f16
you should enable the 'half' feature.
Add the following to your Cargo.toml
:
toml
[dependencies]
argminmax = "0.2"
```rust use argminmax::ArgMinMax; // extension trait for ndarray::ArrayView1 use ndarray::Array1;
let arr: Vec
let (min, max) = arr.view().argminmax(); // apply extension
println!("min: {}, max: {}", min, max); println!("arr[min]: {}, arr[max]: {}", arr[min], arr[max]); ```
Benchmarks on my laptop (AMD Ryzen 7 4800U, 1.8 GHz, 16GB RAM) using criterion show that the function is 3-20x faster than the scalar implementation (depending of data type).
See /benches/results
.
Run the benchmarks yourself with the following command:
bash
cargo bench --quiet --message-format=short --features half | grep "time:"
To run the tests use the following command:
bash
cargo test --message-format=short --features half
Does not support NaNs. (infinites are probably not supported *for f16 either).*
Some parts of this library are inspired by the great work of minimalrust's argmm project.