A simple text-to-speech client for Azure TTS API. :laughing:
You can try the Azure TTS API online: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech
Starting from version 4.0.0, aspeak
is rewritten in rust. The old python version is available at the python
branch.
Please note that the rust rewritten version is experimental and might have bugs!
Download the latest release from here.
After downloading, extract the archive and you will get a binary executable file.
You can put it in a directory that is in your PATH
environment variable so that you can run it from anywhere.
Installing from PyPI will also install the python binding of aspeak
for you. Check Library Usage#Python for more information on using the python binding.
bash
pip install -U aspeak
Now the prebuilt wheels are only available for x86_64 architecture. Due to some technical issues, I haven't uploaded the source distribution to PyPI yet. So to build wheel from source, you need to follow the instructions in Install from Source.
Because of manylinux compatibility issues, the wheels for linux are not available on PyPI. (But you can still build them from source.)
The easiest way to install aspeak
is to use cargo:
bash
cargo install aspeak
To build the python wheel, you need to install maturin
first:
bash
pip install maturin
After cloning the repository and cd
into the directory
, you can build the wheel by running:
bash
maturin build --release --strip -F python --bindings pyo3 --interpreter python --manifest-path Cargo.toml --out dist-pyo3
maturin build --release --strip --bindings bin --interpreter python --manifest-path Cargo.toml --out dist-bin
bash merge-wheel.bash
If everything goes well, you will get a wheel file in the dist
directory.
Run aspeak help
to see the help message.
Run aspeak help <subcommand>
to see the help message of a subcommand.
You can configure aspeak
by creating a profile. Run the following command to create a profile:
sh
$ aspeak config init
To edit the profile, run:
sh
$ aspeak config edit
If you have trouble running the above command, you can edit the profile manually:
Fist get the path of the profile by running:
sh
$ aspeak config where
Then edit the file with your favorite text editor.
The profile is a TOML file. The default profile looks like this:
Check the comments in the config file for more information about available options.
```toml
verbosity = 0
#
#
[auth]
#
#
[text]
locale = "en-US"
rate = 0
pitch = 0
role = "Boy"
style = "general"
#
#
[output]
container = "wav"
aspeak list-qualities
to see available qualities.#
quality = 0
aspeak list-formats
to see available formats.```
sh
$ aspeak text "Hello, world"
sh
$ aspeak ssml << EOF
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'><voice name='en-US-JennyNeural'>Hello, world!</voice></speak>
EOF
sh
$ aspeak list-voices
sh
$ aspeak list-voices -l zh-CN
sh
$ aspeak list-voices -v en-US-SaraNeural
Microsoft Server Speech Text to Speech Voice (en-US, SaraNeural)
Display name: Sara
Local name: Sara @ en-US
Locale: English (United States)
Gender: Female
ID: en-US-SaraNeural
Voice type: Neural
Status: GA
Sample rate: 48000Hz
Words per minute: 157
Styles: ["angry", "cheerful", "excited", "friendly", "hopeful", "sad", "shouting", "terrified", "unfriendly", "whispering"]
sh
$ aspeak text "Hello, world" -o output.wav
If you prefer mp3/ogg/webm, you can use -c mp3
/-c ogg
/-c webm
option.
sh
$ aspeak text "Hello, world" -o output.mp3 -c mp3
$ aspeak text "Hello, world" -o output.ogg -c ogg
$ aspeak text "Hello, world" -o output.webm -c webm
sh
$ aspeak list-qualities
``` Qualities for MP3: 3: audio-48khz-192kbitrate-mono-mp3 2: audio-48khz-96kbitrate-mono-mp3 -3: audio-16khz-64kbitrate-mono-mp3 1: audio-24khz-160kbitrate-mono-mp3 -2: audio-16khz-128kbitrate-mono-mp3 -4: audio-16khz-32kbitrate-mono-mp3 -1: audio-24khz-48kbitrate-mono-mp3 0: audio-24khz-96kbitrate-mono-mp3
Qualities for WAV: -2: riff-8khz-16bit-mono-pcm 1: riff-24khz-16bit-mono-pcm 0: riff-24khz-16bit-mono-pcm -1: riff-16khz-16bit-mono-pcm
Qualities for OGG: 0: ogg-24khz-16bit-mono-opus -1: ogg-16khz-16bit-mono-opus 1: ogg-48khz-16bit-mono-opus
Qualities for WEBM: 0: webm-24khz-16bit-mono-opus -1: webm-16khz-16bit-mono-opus 1: webm-24khz-16bit-24kbps-mono-opus ```
sh
$ aspeak list-formats
amr-wb-16000hz
audio-16khz-128kbitrate-mono-mp3
audio-16khz-16bit-32kbps-mono-opus
audio-16khz-32kbitrate-mono-mp3
audio-16khz-64kbitrate-mono-mp3
audio-24khz-160kbitrate-mono-mp3
audio-24khz-16bit-24kbps-mono-opus
audio-24khz-16bit-48kbps-mono-opus
audio-24khz-48kbitrate-mono-mp3
audio-24khz-96kbitrate-mono-mp3
audio-48khz-192kbitrate-mono-mp3
audio-48khz-96kbitrate-mono-mp3
ogg-16khz-16bit-mono-opus
ogg-24khz-16bit-mono-opus
ogg-48khz-16bit-mono-opus
raw-16khz-16bit-mono-pcm
raw-16khz-16bit-mono-truesilk
raw-22050hz-16bit-mono-pcm
raw-24khz-16bit-mono-pcm
raw-24khz-16bit-mono-truesilk
raw-44100hz-16bit-mono-pcm
raw-48khz-16bit-mono-pcm
raw-8khz-16bit-mono-pcm
raw-8khz-8bit-mono-alaw
raw-8khz-8bit-mono-mulaw
riff-16khz-16bit-mono-pcm
riff-22050hz-16bit-mono-pcm
riff-24khz-16bit-mono-pcm
riff-44100hz-16bit-mono-pcm
riff-48khz-16bit-mono-pcm
riff-8khz-16bit-mono-pcm
riff-8khz-8bit-mono-alaw
riff-8khz-8bit-mono-mulaw
webm-16khz-16bit-mono-opus
webm-24khz-16bit-24kbps-mono-opus
webm-24khz-16bit-mono-opus
```sh
$ aspeak text "Hello, world" -o output.mp3 -c mp3 -q=-1
$ aspeak text "Hello, world" -o output.mp3 -c mp3 -q=3 ```
sh
$ cat input.txt | aspeak text
or
sh
$ aspeak text -f input.txt
with custom encoding:
sh
$ aspeak text -f input.txt -e gbk
sh
$ aspeak text
maybe you prefer:
sh
$ aspeak text -l zh-CN << EOF
我能吞下玻璃而不伤身体。
EOF
sh
$ aspeak text "你好,世界!" -l zh-CN
sh
$ aspeak text "你好,世界!" -v zh-CN-YunjianNeural
sh
$ aspeak text "你好,世界!" -v zh-CN-XiaoxiaoNeural -p 1.5 -r 0.5 -S sad
$ aspeak text "你好,世界!" -v zh-CN-XiaoxiaoNeural -p=-10% -r=+5% -S cheerful
$ aspeak text "你好,世界!" -v zh-CN-XiaoxiaoNeural -p=+40Hz -r=1.2f -S fearful
$ aspeak text "你好,世界!" -v zh-CN-XiaoxiaoNeural -p=high -r=x-slow -S calm
$ aspeak text "你好,世界!" -v zh-CN-XiaoxiaoNeural -p=+1st -r=-7% -S lyrical
Note: Some audio formats are not supported when you are outputting to speaker.
sh
$ aspeak text "Hello World" -F riff-48khz-16bit-mono-pcm -o high-quality.wav
The new version of aspeak
is written in Rust, and the Python binding is provided by PyO3.
Here is a simple example:
```python from aspeak import SpeechService, AudioFormat
service = SpeechService() service.connect() service.speak_text("Hello, world") ```
Add aspeak
to your Cargo.toml
:
bash
$ cargo add aspeak
Then follow the documentation of aspeak
crate.