ttslearn.pretrained¶
学習済みモデルを管理するためのモジュールです。
Pre-trained models will be automatically downloaded if you run TTS functionality (e.g., ttslearn.dnntts.tts.DNNTTS
) at the first time.
The models are saved in $HOME/.cache/ttslearn/
by default.
To control the save location, you can manually set it by the environmental variable TTSLEARN_CACHE_DIR
.
Pretrained models¶
All the models listed here were trained using JSUT corpus.
Model ID |
Class |
Details of the model |
|
DNN-based statistical parametric speech synthesis (sec. 6) |
|
|
WaveNet TTS (sec. 8) |
|
|
An end-to-end TTS based on Tacotron 2 (sec. 10) |
Extra pretrained models¶
Note that the following models are not explained in our book. Those were trained using extra recipes found in our GitHub repository.
Model ID |
Corpus |
Class |
Details of the model |
|
JSUT |
Tacotron 2 with Parallel WaveGAN (PWG). Trained on JSUT corpus. Sampling rate: 16 kHz. |
|
|
JSUT |
Tacotron 2 with PWG. Trained on JSUT corpus. Sampling rate: 24 kHz. |
|
|
JSUT |
Tacotron 2 with HiFi-GAN. Trained on JSUT corpus. Sampling rate: 24 kHz. |
|
|
JVS |
Multi-speaker Tacotron 2 with PWG. Trained on JVS corpus. Sampling rate: 16 kHz. |
|
|
JVS |
Multi-speaker Tacotron 2 with Parallel WaveGAN (PWG). Trained on JVS corpus. Sampling rate: 24 kHz. |
|
|
JVS |
Multi-speaker Tacotron 2 with HiFi-GAN. Trained on JVS corpus. Sampling rate: 24 kHz. |
|
|
common voice |
Multi-speaker Tacotron 2 with PWG. Trained on common voice (ja) corpus. Sampling rate: 16 kHz. |
|
|
common voice |
Multi-speaker Tacotron 2 with PWG. Trained on common voice (ja) corpus. Sampling rate: 24 kHz. |
Helpers¶
Create TTS engine from official pretrained models. |
|
Get available pretrained model names. |
|
Retrieve pretrained model from local cache or download from GitHub. |