memointsqrt

Definitely run the benchmarks before using this. In my benchmarking, I found that for f32s, .sqrt().recip() is often faster than using the lookup table. For f64s, the lookup table is faster for inverse squareroot, but not for squareroot.