nnmnkwii (nanamin kawaii) documentation¶

Library to build speech synthesis systems designed for easy and fast prototyping.

For advanced applications using the library, see External links.

Notes

Tutorials

Package references

Meta information

External links¶

wavenet_vocoder: WaveNet vocoder 6 7
deepvoice3_pytorch: PyTorch implementation of convolutional networks-based text-to-speech synthesis models. 4 5
tacotron_pytorch: PyTorch implementation of Tacotron speech synthesis model. 3
gantts: PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC). 1 2
icassp2020-espnet-tts-merlin-baseline: ICASSP 2020 ESPnet-TTS: Merlin baseline system 8

1: Saito, Yuki, Shinnosuke Takamichi, and Hiroshi Saruwatari. “Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 26.1 (2018): 84-96.
2: Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li, ” Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework”, arXiv:1707.01670, Jul 2017.
3: Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton et al, “Tacotron: Towards End-to-End Speech Synthesis”, arXiv:1703.10135, Mar 2017.
4: Wei Ping, Kainan Peng, Andrew Gibiansky, et al, “Deep Voice 3: 2000-Speaker Neural Text-to-Speech”, arXiv:1710.07654, Oct. 2017.
5: Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara, “Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention”. arXiv:1710.08969, Oct 2017.
6: Aaron van den Oord, Sander Dieleman, Heiga Zen, et al, “WaveNet: A Generative Model for Raw Audio”, arXiv:1609.03499, Sep 2016.
7: Tamamori Akira, Tomoki Hayashi, Kazuhiro Kobayashi, et al. “Speaker-dependent WaveNet vocoder.” Proceedings of Interspeech. 2017.
8: ‘T. Hayashi, R. Yamamoto, K. Inoue, T. Yoshimura, S. Watanabe,T. Toda, K. Takeda, Y. Zhang, and X. Tan, ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit,” arXiv:1910.10909, 2019.