Baseline¶

Generic baseline algorithms that can be used as building blocks.

GMM voice conversion¶

class nnmnkwii.baseline.gmm.MLPG(gmm, windows=None, swap=False, diff=False)[source]¶

Maximum likelihood Parameter Generation (MLPG) for GMM-basd voice conversion [R2].

Notes

Source speaker’s feature: X = {x_t}, 0 <= t < T
Target speaker’s feature: Y = {y_t}, 0 <= t < T

where T is the number of time frames.

See papar [R2] for details.

The code was adapted from https://gist.github.com/r9y9/88bda659c97f46f42525.

Parameters:	gmm (sklearn.mixture.GaussianMixture) – Gaussian Mixture Models of source and target joint features. windows (list) – List of windows. See `nnmnkwii.functions.mlpg()` for details. swap (bool) – If True, source -> target, otherwise target -> source. diff (bool) – Convert GMM -> DIFFGMM if True.

num_mixtures¶: int – The number of Gaussian mixtures

weights¶: array – shape (num_mixtures), weights for each gaussian

src_means¶: array – shape (num_mixtures, order of spectral feature) means of GMM for a source speaker

tgt_means¶: array – shape (num_mixtures, order of spectral feature) means of GMM for a target speaker

covarXX¶: array – shape (num_mixtures, order of spectral feature, order of spectral feature) variance matrix of source speaker’s spectral feature

covarXY¶: array – shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrix of source and target speaker’s spectral feature

covarYX¶: array – shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrix of target and source speaker’s spectral feature

covarYY¶: array – shape (num_mixtures, order of spectral feature, order of spectral feature) variance matrix of target speaker’s spectral feature

D¶: array – shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrices of target static spectral features

px¶: sklearn.mixture.GaussianMixture – Gaussian Mixture Models of source speaker’s features

Examples

>>> from sklearn.mixture import GaussianMixture
>>> from nnmnkwii.baseline.gmm import MLPG
>>> import numpy as np
>>> static_dim, T = 24, 10
>>> windows = [
...     (0, 0, np.array([1.0])),
...     (1, 1, np.array([-0.5, 0.0, 0.5])),
...     (1, 1, np.array([1.0, -2.0, 1.0])),
... ]
>>> src = np.random.rand(T, static_dim * len(windows))
>>> tgt = np.random.rand(T, static_dim * len(windows))
>>> XY = np.concatenate((src, tgt), axis=-1) # pseudo parallel data
>>> gmm = GaussianMixture(n_components=4)
>>> _ = gmm.fit(XY)
>>> paramgen = MLPG(gmm, windows=windows)
>>> generated = paramgen.transform(src)
>>> assert generated.shape == (T, static_dim)

Parameters:	src (array) – shape (the number of frames, the order of spectral feature) a sequence of source speaker’s spectral feature that will be transformed.
Returns:	a sequence of transformed features
Return type:	array