Deep Learning

Improved Parallel WaveGAN with perceptually weighted spectrogram loss

Preprint: arXiv:2101.07412 (accepted to SLT 2021)

Eunwoo Song, Ryuichi Yamamoto, Min-Jae Hwang, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim

TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis

Preprint: arXiv:2010.13421 (accepted to ICASSP 2021)

Min-Jae Hwang, Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators

Preprint: arXiv:2010.14151 (accepted to ICASSP 2021)

Ryuichi Yamamoto, Eunwoo Song, Min-Jae Hwang, Jae-Min Kim

NNSVS: Pytorchベースの研究用歌声合成ライブラリ

Neural network based singing voice synthesis: https://github.com/r9y9/nnsvs

May 10, 2020 1 min read

Neural text-to-speech with a modeling-by-generation excitation vocoder

Preprint: arXiv:2008.00132, Published version: ISCA Archive Interspeech 2020

Eunwoo Song, Min-Jae Hwang, Ryuichi Yamamoto, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim

End-to-End 音声合成の研究を加速させるツールキット ESPnet-TTS / ESPnet-TTS: A toolkit to accelerate research on end-to-end speech synthesis @ ASJ 2020s

Mar 16, 2020 1:00 PM — 1:30 PM

Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Kazuya Takemura, Tomoki Toda, Shinji Watanabe

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

Preprint: arXiv:1910.10909 (submitted to ICASSP 2020)

Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

Preprint: arXiv:1910.11480 (accepted to ICASSP 2020)

Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

Probability Density Distillation with Generative Adversarial Networks for High-Quality Parallel Waveform Generation

Preprint: arXiv:1904.04472, Published version: ISCA Archive Interspeech 2019

Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

WN-based TTSやりました / Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions [arXiv:1712.05884]

Audio samples: https://r9y9.github.io/wavenet_vocoder/

May 20, 2018 3 min read