Baseline¶
Generic baseline algorithms that can be used as building blocks.
GMM voice conversion¶
-
class
nnmnkwii.baseline.gmm.
MLPG
(gmm, windows=None, swap=False, diff=False)[source]¶ Maximum likelihood Parameter Generation (MLPG) for GMM-basd voice conversion [1].
Notes
Source speaker’s feature:
X = {x_t}, 0 <= t < T
Target speaker’s feature:
Y = {y_t}, 0 <= t < T
where T is the number of time frames.
See papar [1] for details.
The code was adapted from https://gist.github.com/r9y9/88bda659c97f46f42525.
- Parameters
gmm (sklearn.mixture.GaussianMixture) – Gaussian Mixture Models of source and target joint features.
windows (list) – List of windows. See
nnmnkwii.functions.mlpg()
for details.swap (bool) – If True, source -> target, otherwise target -> source.
diff (bool) – Convert GMM -> DIFFGMM if True.
-
weights
¶ shape (num_mixtures), weights for each gaussian
- Type
array
-
src_means
¶ shape (num_mixtures, order of spectral feature) means of GMM for a source speaker
- Type
array
-
tgt_means
¶ shape (num_mixtures, order of spectral feature) means of GMM for a target speaker
- Type
array
-
covarXX
¶ shape (num_mixtures, order of spectral feature, order of spectral feature) variance matrix of source speaker’s spectral feature
- Type
array
-
covarXY
¶ shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrix of source and target speaker’s spectral feature
- Type
array
-
covarYX
¶ shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrix of target and source speaker’s spectral feature
- Type
array
-
covarYY
¶ shape (num_mixtures, order of spectral feature, order of spectral feature) variance matrix of target speaker’s spectral feature
- Type
array
-
D
¶ shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrices of target static spectral features
- Type
array
-
px
¶ Gaussian Mixture Models of source speaker’s features
Examples
>>> from sklearn.mixture import GaussianMixture >>> from nnmnkwii.baseline.gmm import MLPG >>> import numpy as np >>> static_dim, T = 24, 10 >>> windows = [ ... (0, 0, np.array([1.0])), ... (1, 1, np.array([-0.5, 0.0, 0.5])), ... (1, 1, np.array([1.0, -2.0, 1.0])), ... ] >>> src = np.random.rand(T, static_dim * len(windows)) >>> tgt = np.random.rand(T, static_dim * len(windows)) >>> XY = np.concatenate((src, tgt), axis=-1) # pseudo parallel data >>> gmm = GaussianMixture(n_components=4) >>> _ = gmm.fit(XY) >>> paramgen = MLPG(gmm, windows=windows) >>> generated = paramgen.transform(src) >>> assert generated.shape == (T, static_dim)
- 1(1,2)
[Toda 2007] Voice Conversion Based on Maximum Likelihood Estimation of Spectral Parameter Trajectory.
-
transform
(src)[source]¶ Mapping source feature x to target feature y so that maximize the likelihood of y given x.
- Parameters
src (array) – shape (the number of frames, the order of spectral feature) a sequence of source speaker’s spectral feature that will be transformed.
- Returns
a sequence of transformed features
- Return type
array