It looks something like this:
let lotsof3s = (&[-123.456f32; 128][..]).simditer()
.map(|v| { f32s::splat(9.0) * v.abs().sqrt().rsqrt().ceil().sqrt() -
f32s::splat(4.0) - f32s::splat(2.0) })
.scalarcollect::
Which is analogous to this scalar code:
let lotsof3s = (&[-123.456f32; 128][..]).iter()
.map(|v| { 9.0 * v.abs().sqrt().sqrt().recip().ceil().sqrt() -
4.0 - 2.0 })
.collect::
The vector size is entirely determined by the machine you're compiling for - it attempts to use the largest vector size supported by your machine, and works on any platform or architecture (see below for details).
Compare this to traditional explicit SIMD:
use std::mem::transmute; use stdsimd::{f32x4, f32x8};
let lotsof3s = &mut [-123.456f32; 128][..];
if cfg!(all(not(targetfeature = "avx"), targetfeature = "sse")) {
for ch in init.chunksmut(4) {
let v = f32x4::load(ch, 0);
let scalarabsmask = unsafe { transmute::
Even with all of that boilerplate, this still only supports x86-64 machines with SSE or AVX. ** Upcoming Features Zero-overhead support for uneven collections which don't fit entirely into a vector is upcoming. Also, zero-allocation collects are coming, to be called ~fill~.
By 0.2.0, this code will compile:
let someu8s = [0u8; 100]; let filledu8s = (&[0u8; 100][..]).simditer() .unevenmap(|vector| vector * splat(2), |scalar| scalar * 2) .scalarfill(&mut someu8s);
More intrinsic traits are also coming; feel free to open an issue or pull request if you have one you'd like to see. ** Compatibility Faster currently supports x86 machines with SSE and above, although AVX-512 support isn't working in rustc yet. Support for non-x86 architectures is currently blocked by stdsimd and rustc.
Of course, once those issues are resolved, adding support ARM, MIPS, or any other intrinsics and vector lengths will be trivial. ** Performance Here are some extremely unscientific benchmarks which, at least, prove that this isn't any worse than scalar iterators.
running 4 tests test tests::benchnopscalar ... bench: 51 ns/iter (+/- 1) test tests::benchnopsimd ... bench: 51 ns/iter (+/- 1) test tests::benchworkscalar ... bench: 1,276 ns/iter (+/- 39) test tests::benchworksimd ... bench: 251 ns/iter (+/- 0)