ttslearn.pretrained¶

学習済みモデルを管理するためのモジュールです。

Pre-trained models will be automatically downloaded if you run TTS functionality (e.g., ttslearn.dnntts.tts.DNNTTS) at the first time. The models are saved in $HOME/.cache/ttslearn/ by default. To control the save location, you can manually set it by the environmental variable TTSLEARN_CACHE_DIR.

Pretrained models¶

All the models listed here were trained using JSUT corpus.

Model ID	Class	Details of the model
`dnntts`	`ttslearn.dnntts.tts.DNNTTS`	DNN-based statistical parametric speech synthesis (sec. 6)
`wavenettts`	`ttslearn.wavenet.tts.WaveNetTTS`	WaveNet TTS (sec. 8)
`tacotron2`	`ttslearn.tacotron.tts.Tacotron2TTS`	An end-to-end TTS based on Tacotron 2 (sec. 10)

Extra pretrained models¶

Note that the following models are not explained in our book. Those were trained using extra recipes found in our GitHub repository.

Model ID	Corpus	Class	Details of the model
`tacotron2_pwg_jsut16k`	JSUT	`ttslearn.contrib.tacotron2_pwg.Tacotron2PWGTTS`	Tacotron 2 with Parallel WaveGAN (PWG). Trained on JSUT corpus. Sampling rate: 16 kHz.
`tacotron2_pwg_jsut24k`	JSUT	`ttslearn.contrib.tacotron2_pwg.Tacotron2PWGTTS`	Tacotron 2 with PWG. Trained on JSUT corpus. Sampling rate: 24 kHz.
`tacotron2_hifipwg_jsut24k`	JSUT	`ttslearn.contrib.tacotron2_pwg.Tacotron2PWGTTS`	Tacotron 2 with HiFi-GAN. Trained on JSUT corpus. Sampling rate: 24 kHz.
`multspk_tacotron2_pwg_jvs16k`	JVS	`ttslearn.contrib.tacotron2_pwg.Tacotron2PWGTTS`	Multi-speaker Tacotron 2 with PWG. Trained on JVS corpus. Sampling rate: 16 kHz.
`multspk_tacotron2_pwg_jvs24k`	JVS	`ttslearn.contrib.tacotron2_pwg.Tacotron2PWGTTS`	Multi-speaker Tacotron 2 with Parallel WaveGAN (PWG). Trained on JVS corpus. Sampling rate: 24 kHz.
`multspk_tacotron2_hifipwg_jvs24k`	JVS	`ttslearn.contrib.tacotron2_pwg.Tacotron2PWGTTS`	Multi-speaker Tacotron 2 with HiFi-GAN. Trained on JVS corpus. Sampling rate: 24 kHz.
`multspk_tacotron2_pwg_cv16k`	common voice	`ttslearn.contrib.tacotron2_pwg.Tacotron2PWGTTS`	Multi-speaker Tacotron 2 with PWG. Trained on common voice (ja) corpus. Sampling rate: 16 kHz.
`multspk_tacotron2_pwg_cv24k`	common voice	`ttslearn.contrib.tacotron2_pwg.Tacotron2PWGTTS`	Multi-speaker Tacotron 2 with PWG. Trained on common voice (ja) corpus. Sampling rate: 24 kHz.

Helpers¶

`create_tts_engine`	Create TTS engine from official pretrained models.
`get_available_model_ids`	Get available pretrained model names.
`retrieve_pretrained_model`	Retrieve pretrained model from local cache or download from GitHub.