A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) using on Deep Filtering.
Original DeepFilterNet Paper: DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering
New DeepFilterNet2 Paper: DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio
This framework supports Linux, MacOS and Windows. Training is only tested under Linux. The framework is structured as follows:
libDF
contains Rust code used for data loading and augmentation.DeepFilterNet
contains DeepFilterNet code training, evaluation and visualization as well as pretrained model weights.pyDF
contains a Python wrapper of libDF STFT/ISTFT processing loop.pyDF-data
contains a Python wrapper of libDF dataset functionality and provides a pytorch data loader.Install the DeepFilterNet python package via pip: ```bash
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
pip install deepfilternet
pip install deepfilternet[train] ```
To enhance noisy audio files using DeepFilterNet run ```bash
deepFilter path/to/noisy_audio.wav ```
Install cargo via rustup. Usage of a conda
or virtualenv
recommended.
Installation of python dependencies and libDF: ```bash cd path/to/DeepFilterNet/ # cd into repository
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
pip install maturin poetry
maturin develop --release -m pyDF/Cargo.toml
maturin develop --release -m pyDF-data/Cargo.toml
cd DeepFilterNet poetry install -E train -E eval # Note: This globally installs DeepFilterNet in your environment
poetry install -E train -E eval --no-root
export PYTHONPATH=$PWD ```
To enhance noisy audio files using DeepFilterNet run ```bash $ python DeepFilterNet/df/enhance.py --help usage: enhance.py [-h] [--model-base-dir MODELBASEDIR] [--pf] [--output-dir OUTPUTDIR] [--log-level LOGLEVEL] [--compensate-delay] noisyaudiofiles [noisyaudiofiles ...]
positional arguments: noisyaudiofiles List of noise files to mix with the clean speech file.
optional arguments:
-h, --help show this help message and exit
--model-base-dir MODELBASEDIR, -m MODELBASEDIR
Model directory containing checkpoints and config.
To load a pretrained model, you may just provide the model name, e.g. DeepFilterNet
.
By default, the pretrained DeepFilterNet2 model is loaded.
--pf Post-filter that slightly over-attenuates very noisy sections.
--output-dir OUTPUTDIR, -o OUTPUTDIR
Directory in which the enhanced audio files will be stored.
--log-level LOG_LEVEL
Logger verbosity. Can be one of (debug, info, error, none)
--compensate-delay, -D
Add some paddig to compensate the delay introduced by the real-time STFT/ISTFT implementation.
python DeepFilterNet/df/enhance.py -m DeepFilterNet path/to/noisy_audio.wav
python DeepFilterNet/df/enhance.py -m DeepFilterNet2 path/to/noisy_audio.wav ```
The entry point is DeepFilterNet/df/train.py
. It expects a data directory containing HDF5 dataset
as well as a dataset configuration json file.
So, you first need to create your datasets in HDF5 format. Each dataset typically only holds training, validation, or test set of noise, speech or RIRs. ```py
pip install h5py librosa soundfile
cd path/to/DeepFilterNet/DeepFilterNet
#
#
speech
, noise
, rir
python df/preparedata.py --sr 48000 speech trainingset.txt TRAINSETSPEECH.hdf5 ``` All datasets should be made available in one dataset folder for the train script.
The dataset configuration file should contain 3 entries: "train", "valid", "test". Each of those contains a list of datasets (e.g. a speech, noise and a RIR dataset). You can use multiple speech or noise dataset. Optionally, a sampling factor may be specified that can be used to over/under-sample the dataset. Say, you have a specific dataset with transient noises and want to increase the amount of non-stationary noises by oversampling. In most cases you want to set this factor to 1.
Dataset config example:
`dataset.cfg` ```json { "train": [ [ "TRAIN_SET_SPEECH.hdf5", 1.0 ], [ "TRAIN_SET_NOISE.hdf5", 1.0 ], [ "TRAIN_SET_RIR.hdf5", 1.0 ] ], "valid": [ [ "VALID_SET_SPEECH.hdf5", 1.0 ], [ "VALID_SET_NOISE.hdf5", 1.0 ], [ "VALID_SET_RIR.hdf5", 1.0 ] ], "test": [ [ "TEST_SET_SPEECH.hdf5", 1.0 ], [ "TEST_SET_NOISE.hdf5", 1.0 ], [ "TEST_SET_RIR.hdf5", 1.0 ] ] } ```
Finally, start the training script. The training script may create a model base_dir
if not
existing used for logging, some audio samples, model checkpoints, and config. If no config file is
found, it will create a default config. See
DeepFilterNet/pretrained_models/DeepFilterNet
for a config file.
```py
python df/train.py path/to/dataset.cfg path/to/datadir/ path/to/basedir/ ```
Iy you use this framework, please cite: DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering
bibtex
@inproceedings{schroeter2022deepfilternet,
title={DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering},
author={Hendrik Schröter and Alberto N. Escalante-B. and Tobias Rosenkranz and Andreas Maier},
booktitle={ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2022},
organization={IEEE}
}
If you use the DeepFilterNet2 model, please cite: DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio
```bibtex @misc{schroeter2022deepfilternet2, title = {{DeepFilterNet2}: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio}, author = {Schröter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas}, publisher = {arXiv}, year = {2022}, url = {https://arxiv.org/abs/2205.05474}, }
```
DeepFilterNet is free and open source! All code in this repository is dual-licensed under either:
at your option. This means you can select the license you prefer!
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.