ttslearn.dnntts

DNN音声合成のためのモジュールです。

TTS

The TTS functionality is accessible from ttslearn.dnntts.*

class ttslearn.dnntts.tts.DNNTTS(model_dir=None, device='cpu')[source]

DNN-based text-to-speech

Parameters
  • model_dir (str) – model directory. A pre-trained model (ID: dnntts) is used if None.

  • device (str) – cpu or cuda.

Examples:

from ttslearn.dnntts import DNNTTS
import matplotlib.pyplot as plt

engine = DNNTTS()
wav, sr = engine.tts("日本語音声合成のデモです。")

fig, ax = plt.subplots(figsize=(8,2))
librosa.display.waveplot(wav.astype(np.float32), sr, ax=ax)
_images/dnntts-1.png
set_device(device)[source]

Set device for the TTS models

Parameters

device (str) – cpu or cuda.

tts(text, post_filter=True, tqdm=None)[source]

Run TTS

Parameters
  • text (str) – Input text

  • post_filter (bool, optional) – Use post-filter or not. Defaults to True.

  • tqdm (object, optional) – tqdm object. Defaults to None.

Returns

audio array (np.int16) and sampling rate (int)

Return type

tuple

Models

The following models are acceible from ttslearn.dnntts.*

Feed-forward DNN

class ttslearn.dnntts.model.DNN(in_dim, hidden_dim, out_dim, num_layers=2)[source]

Feed-forward neural network

Parameters
  • in_dim – input dimension

  • hidden_dim – hidden dimension

  • out_dim – output dimension

  • num_layers – number of layers

forward(seqs, lens=None)[source]

Forward step

Parameters
  • seqs (torch.Tensor) – input sequences

  • lens (torch.Tensor) – length of input sequences

Returns

output sequences

Return type

torch.Tensor

LSTM-RNN

class ttslearn.dnntts.model.LSTMRNN(in_dim, hidden_dim, out_dim, num_layers=1, bidirectional=True, dropout=0.0)[source]

LSTM-based recurrent neural networks

Parameters
  • in_dim (int) – input dimension

  • hidden_dim (int) – hidden dimension

  • out_dim (int) – output dimension

  • num_layers (int) – number of layers

  • bidirectional (bool) – bi-directional or not.

  • dropout (float) – dropout ratio.

forward(seqs, lens)[source]

Forward step

Parameters
  • seqs (torch.Tensor) – input sequences

  • lens (torch.Tensor) – length of input sequences

Returns

output sequences

Return type

torch.Tensor

Multi-stream functionality

split_streams

Split streams from multi-stream features

multi_stream_mlpg

Split streams and do apply MLPG if stream has dynamic features

get_windows

Get windows for parameter generation

get_static_stream_sizes

Get static sizes for each feature stream

get_static_features

Get static features from static+dynamic features

Generation utility

predict_duration

Predict phoneme durations.

predict_acoustic

Predict acoustic features.

gen_waveform

Generate waveform from acoustic features.