{
"cells": [
{
"cell_type": "markdown",
"id": "norman-supervisor",
"metadata": {},
"source": [
"# Quick start"
]
},
{
"cell_type": "markdown",
"id": "demonstrated-audit",
"metadata": {},
"source": [
"[](https://colab.research.google.com/github/r9y9/ttslearn/blob/master/notebooks/ch00_Quick-start.ipynb)\n",
"\n",
"「Pythonで学ぶ音声合成」のquick startページへようこそ!\n",
"\n",
"このページ(ノートブック形式)では、書籍中で解説している3つの音声合成について、音声合成のサンプルコード・音声サンプルを示します。「解説を読む前に手を動かしてみたい」という方には、最初の一歩に最適なノートブックです。\n",
"\n",
"ここで示す音声合成は、GitHubリポジトリで学習済みモデルが配布されています。音声サンプルを聴くだけでなく、ぜひ自分で音声合成を試してみて下さい。\n",
"そして、音声合成の詳細を理解するためには、ソースコードと書籍を併せて参照してください。"
]
},
{
"cell_type": "markdown",
"id": "productive-chrome",
"metadata": {},
"source": [
"## 準備"
]
},
{
"cell_type": "markdown",
"id": "victorian-atlas",
"metadata": {},
"source": [
"### ttslearn のインストール"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "falling-performance",
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"try:\n",
" import ttslearn\n",
"except ImportError:\n",
" !pip install ttslearn"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "missing-suite",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'0.2.2'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import ttslearn\n",
"ttslearn.__version__"
]
},
{
"cell_type": "markdown",
"id": "absent-tracy",
"metadata": {},
"source": [
"### パッケージのインポート"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "improving-removal",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
}
],
"source": [
"%pylab inline\n",
"import IPython\n",
"from IPython.display import Audio\n",
"import librosa\n",
"import librosa.display\n",
"from tqdm.notebook import tqdm\n",
"import torch"
]
},
{
"cell_type": "markdown",
"id": "magnetic-driver",
"metadata": {},
"source": [
"## DNN音声合成 (第5章・第6章)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "authorized-voice",
"metadata": {},
"outputs": [],
"source": [
"from ttslearn.dnntts import DNNTTS\n",
"dnntts_engine = DNNTTS()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "sorted-fault",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.39 s, sys: 21.4 ms, total: 1.41 s\n",
"Wall time: 673 ms\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%time wav, sr = dnntts_engine.tts(\"あらゆる現実を、すべて自分のほうへねじ曲げたのだ。\")\n",
"IPython.display.display(Audio(wav, rate=sr))"
]
},
{
"cell_type": "markdown",
"id": "suitable-aquatic",
"metadata": {},
"source": [
"## WaveNet音声合成 (第7章・第8章)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "cardiac-elizabeth",
"metadata": {},
"outputs": [],
"source": [
"from ttslearn.wavenet import WaveNetTTS\n",
"wavenet_engine = WaveNetTTS()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "rolled-contrary",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e6988acd4e724566a86161cb49b42255",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/52640 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 15min 29s, sys: 12.3 s, total: 15min 41s\n",
"Wall time: 3min 55s\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%time wav, sr = wavenet_engine.tts(\"小さな鰻屋に、熱気のようなものがみなぎる\", tqdm=tqdm)\n",
"IPython.display.display(Audio(wav, rate=sr))"
]
},
{
"cell_type": "markdown",
"id": "alike-music",
"metadata": {},
"source": [
"## Tacotron 2 (第9章・第10章)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "legal-leone",
"metadata": {},
"outputs": [],
"source": [
"from ttslearn.tacotron import Tacotron2TTS\n",
"tacotron_engine = Tacotron2TTS()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "caroline-damages",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "03e4fc4091e046b1b1b6b33606649734",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/51200 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 13min 32s, sys: 10.4 s, total: 13min 43s\n",
"Wall time: 3min 25s\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%time wav, sr = tacotron_engine.tts(\"昼にはペスカトーレを、夜には寿司をパクパク食べた。\", tqdm=tqdm)\n",
"IPython.display.display(Audio(wav, rate=sr))"
]
},
{
"cell_type": "markdown",
"id": "institutional-visitor",
"metadata": {},
"source": [
"## おわりに"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "clear-tobacco",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"これから音声合成を始める皆様にとって、少しでも学習の助けになれば幸いです。\n",
"CPU times: user 1.89 s, sys: 40 ms, total: 1.93 s\n",
"Wall time: 1.09 s\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "f94813b97c414e3790808100ef8428ee",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/97680 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 27min 14s, sys: 18.3 s, total: 27min 33s\n",
"Wall time: 6min 53s\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a90f5a7703d84d9a8f4f370e6a58073a",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/92400 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 23min 16s, sys: 17.2 s, total: 23min 33s\n",
"Wall time: 5min 53s\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"text = \"これから音声合成を始める皆様にとって、少しでも学習の助けになれば幸いです。\"\n",
"print(text)\n",
"\n",
"for idx, (name, engine) in enumerate([\n",
" (\"DNNTTS\", dnntts_engine), \n",
" (\"WaveNet TTS\", wavenet_engine),\n",
" (\"Tacotron 2\", tacotron_engine),\n",
"]):\n",
" %time wav, sr = engine.tts(text, tqdm=tqdm)\n",
" IPython.display.display(Audio(wav, rate=sr))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}