nnmnkwii (nanami) documentation¶
Library to build speech synthesis systems designed for easy and fast prototyping.
- Github: https://github.com/r9y9/nnmnkwii
- Tutorial notebooks: https://github.com/r9y9/nnmnkwii_gallery
For advanced applications using the library, see External links.
Tutorials
Package references
Meta information
External links¶
- wavenet_vocoder: WaveNet vocoder [6] [7]
- deepvoice3_pytorch: PyTorch implementation of convolutional networks-based text-to-speech synthesis models. [4] [5]
- tacotron_pytorch: PyTorch implementation of Tacotron speech synthesis model. [3]
- gantts: PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC). [1] [2]
[1] | Saito, Yuki, Shinnosuke Takamichi, and Hiroshi Saruwatari. “Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 26.1 (2018): 84-96. |
[2] | Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li, ” Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework”, arXiv:1707.01670, Jul 2017. |
[3] | Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton et al, “Tacotron: Towards End-to-End Speech Synthesis”, arXiv:1703.10135, Mar 2017. |
[4] | Wei Ping, Kainan Peng, Andrew Gibiansky, et al, “Deep Voice 3: 2000-Speaker Neural Text-to-Speech”, arXiv:1710.07654, Oct. 2017. |
[5] | Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara, “Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention”. arXiv:1710.08969, Oct 2017. |
[6] | Aaron van den Oord, Sander Dieleman, Heiga Zen, et al, “WaveNet: A Generative Model for Raw Audio”, arXiv:1609.03499, Sep 2016. |
[7] | Tamamori Akira, Tomoki Hayashi, Kazuhiro Kobayashi, et al. “Speaker-dependent WaveNet vocoder.” Proceedings of Interspeech. 2017. |