A Rust implementation of mel spectrograms aligned to the results from the whisper.cpp, pytorch and librosa reference implementations and suited to streaming audio.
The main objective is to allow inference from spectrograms alone, so that audio samples don't need to be kept in context for follow-up processing.
Mel filter banks are within 1.0e-7 of librosa.filters.mel
and identical to
the GGML model-embedded filters used by whisper.cpp.
A stft implementation that allows creating spectrograms from an audio steam - near identical to those produced by whisper.cpp internally.
An example of whisper inference from mel spectrograms via whisper-rs
can be
found in the tests.