nnmnkwii (nanamin kawaii) documentation¶
Library to build speech synthesis systems designed for easy and fast prototyping.
Github: https://github.com/r9y9/nnmnkwii
Tutorial notebooks: https://github.com/r9y9/nnmnkwii_gallery
For advanced applications using the library, see External links.
External links¶
wavenet_vocoder: WaveNet vocoder 6 7
deepvoice3_pytorch: PyTorch implementation of convolutional networks-based text-to-speech synthesis models. 4 5
tacotron_pytorch: PyTorch implementation of Tacotron speech synthesis model. 3
gantts: PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC). 1 2
icassp2020-espnet-tts-merlin-baseline: ICASSP 2020 ESPnet-TTS: Merlin baseline system 8
- 1
Saito, Yuki, Shinnosuke Takamichi, and Hiroshi Saruwatari. “Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 26.1 (2018): 84-96.
- 2
Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li, ” Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework”, arXiv:1707.01670, Jul 2017.
- 3
Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton et al, “Tacotron: Towards End-to-End Speech Synthesis”, arXiv:1703.10135, Mar 2017.
- 4
Wei Ping, Kainan Peng, Andrew Gibiansky, et al, “Deep Voice 3: 2000-Speaker Neural Text-to-Speech”, arXiv:1710.07654, Oct. 2017.
- 5
Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara, “Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention”. arXiv:1710.08969, Oct 2017.
- 6
Aaron van den Oord, Sander Dieleman, Heiga Zen, et al, “WaveNet: A Generative Model for Raw Audio”, arXiv:1609.03499, Sep 2016.
- 7
Tamamori Akira, Tomoki Hayashi, Kazuhiro Kobayashi, et al. “Speaker-dependent WaveNet vocoder.” Proceedings of Interspeech. 2017.
- 8
‘T. Hayashi, R. Yamamoto, K. Inoue, T. Yoshimura, S. Watanabe,T. Toda, K. Takeda, Y. Zhang, and X. Tan, ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit,” arXiv:1910.10909, 2019.