Baseline

Generic baseline algorithms that can be used as building blocks.

GMM voice conversion

class nnmnkwii.baseline.gmm.MLPG(gmm, windows=None, swap=False, diff=False)[source]

Maximum likelihood Parameter Generation (MLPG) for GMM-basd voice conversion [1].

Notes

  • Source speaker’s feature: X = {x_t}, 0 <= t < T
  • Target speaker’s feature: Y = {y_t}, 0 <= t < T

where T is the number of time frames.

See papar [1] for details.

The code was adapted from https://gist.github.com/r9y9/88bda659c97f46f42525.

Parameters:
  • gmm (sklearn.mixture.GaussianMixture) – Gaussian Mixture Models of source and target joint features.
  • windows (list) – List of windows. See nnmnkwii.functions.mlpg() for details.
  • swap (bool) – If True, source -> target, otherwise target -> source.
  • diff (bool) – Convert GMM -> DIFFGMM if True.
num_mixtures

The number of Gaussian mixtures

Type:int
weights

shape (num_mixtures), weights for each gaussian

Type:array
src_means

shape (num_mixtures, order of spectral feature) means of GMM for a source speaker

Type:array
tgt_means

shape (num_mixtures, order of spectral feature) means of GMM for a target speaker

Type:array
covarXX

shape (num_mixtures, order of spectral feature, order of spectral feature) variance matrix of source speaker’s spectral feature

Type:array
covarXY

shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrix of source and target speaker’s spectral feature

Type:array
covarYX

shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrix of target and source speaker’s spectral feature

Type:array
covarYY

shape (num_mixtures, order of spectral feature, order of spectral feature) variance matrix of target speaker’s spectral feature

Type:array
D

shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrices of target static spectral features

Type:array
px

Gaussian Mixture Models of source speaker’s features

Type:sklearn.mixture.GaussianMixture

Examples

>>> from sklearn.mixture import GaussianMixture
>>> from nnmnkwii.baseline.gmm import MLPG
>>> import numpy as np
>>> static_dim, T = 24, 10
>>> windows = [
...     (0, 0, np.array([1.0])),
...     (1, 1, np.array([-0.5, 0.0, 0.5])),
...     (1, 1, np.array([1.0, -2.0, 1.0])),
... ]
>>> src = np.random.rand(T, static_dim * len(windows))
>>> tgt = np.random.rand(T, static_dim * len(windows))
>>> XY = np.concatenate((src, tgt), axis=-1) # pseudo parallel data
>>> gmm = GaussianMixture(n_components=4)
>>> _ = gmm.fit(XY)
>>> paramgen = MLPG(gmm, windows=windows)
>>> generated = paramgen.transform(src)
>>> assert generated.shape == (T, static_dim)
[1](1, 2) [Toda 2007] Voice Conversion Based on Maximum Likelihood Estimation of Spectral Parameter Trajectory.
transform(src)[source]

Mapping source feature x to target feature y so that maximize the likelihood of y given x.

Parameters:src (array) – shape (the number of frames, the order of spectral feature) a sequence of source speaker’s spectral feature that will be transformed.
Returns:a sequence of transformed features
Return type:array