Baseline

Generic baseline algorithms that can be used as building blocks.

GMM voice conversion

class nnmnkwii.baseline.gmm.MLPG(gmm, windows=None, swap=False, diff=False)[source]

Maximum likelihood Parameter Generation (MLPG) for GMM-basd voice conversion [1].

Notes

  • Source speaker’s feature: X = {x_t}, 0 <= t < T

  • Target speaker’s feature: Y = {y_t}, 0 <= t < T

where T is the number of time frames.

See papar [1] for details.

The code was adapted from https://gist.github.com/r9y9/88bda659c97f46f42525.

Parameters
  • gmm (sklearn.mixture.GaussianMixture) – Gaussian Mixture Models of source and target joint features.

  • windows (list) – List of windows. See nnmnkwii.functions.mlpg() for details.

  • swap (bool) – If True, source -> target, otherwise target -> source.

  • diff (bool) – Convert GMM -> DIFFGMM if True.

num_mixtures

The number of Gaussian mixtures

Type

int

weights

shape (num_mixtures), weights for each gaussian

Type

array

src_means

shape (num_mixtures, order of spectral feature) means of GMM for a source speaker

Type

array

tgt_means

shape (num_mixtures, order of spectral feature) means of GMM for a target speaker

Type

array

covarXX

shape (num_mixtures, order of spectral feature, order of spectral feature) variance matrix of source speaker’s spectral feature

Type

array

covarXY

shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrix of source and target speaker’s spectral feature

Type

array

covarYX

shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrix of target and source speaker’s spectral feature

Type

array

covarYY

shape (num_mixtures, order of spectral feature, order of spectral feature) variance matrix of target speaker’s spectral feature

Type

array

D

shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrices of target static spectral features

Type

array

px

Gaussian Mixture Models of source speaker’s features

Type

sklearn.mixture.GaussianMixture

Examples

>>> from sklearn.mixture import GaussianMixture
>>> from nnmnkwii.baseline.gmm import MLPG
>>> import numpy as np
>>> static_dim, T = 24, 10
>>> windows = [
...     (0, 0, np.array([1.0])),
...     (1, 1, np.array([-0.5, 0.0, 0.5])),
...     (1, 1, np.array([1.0, -2.0, 1.0])),
... ]
>>> src = np.random.rand(T, static_dim * len(windows))
>>> tgt = np.random.rand(T, static_dim * len(windows))
>>> XY = np.concatenate((src, tgt), axis=-1) # pseudo parallel data
>>> gmm = GaussianMixture(n_components=4)
>>> _ = gmm.fit(XY)
>>> paramgen = MLPG(gmm, windows=windows)
>>> generated = paramgen.transform(src)
>>> assert generated.shape == (T, static_dim)
1(1,2)

[Toda 2007] Voice Conversion Based on Maximum Likelihood Estimation of Spectral Parameter Trajectory.

transform(src)[source]

Mapping source feature x to target feature y so that maximize the likelihood of y given x.

Parameters

src (array) – shape (the number of frames, the order of spectral feature) a sequence of source speaker’s spectral feature that will be transformed.

Returns

a sequence of transformed features

Return type

array