Baseline¶
Generic baseline algorithms that can be used as building blocks.
GMM voice conversion¶
-
class
nnmnkwii.baseline.gmm.
MLPG
(gmm, windows=None, swap=False, diff=False)[source]¶ Maximum likelihood Parameter Generation (MLPG) for GMM-basd voice conversion [R2].
Notes
- Source speaker’s feature:
X = {x_t}, 0 <= t < T
- Target speaker’s feature:
Y = {y_t}, 0 <= t < T
where T is the number of time frames.
See papar [R2] for details.
The code was adapted from https://gist.github.com/r9y9/88bda659c97f46f42525.
Parameters: - gmm (sklearn.mixture.GaussianMixture) – Gaussian Mixture Models of source and target joint features.
- windows (list) – List of windows. See
nnmnkwii.functions.mlpg()
for details. - swap (bool) – If True, source -> target, otherwise target -> source.
- diff (bool) – Convert GMM -> DIFFGMM if True.
-
num_mixtures
¶ int – The number of Gaussian mixtures
-
weights
¶ array – shape (num_mixtures), weights for each gaussian
-
src_means
¶ array – shape (num_mixtures, order of spectral feature) means of GMM for a source speaker
-
tgt_means
¶ array – shape (num_mixtures, order of spectral feature) means of GMM for a target speaker
-
covarXX
¶ array – shape (num_mixtures, order of spectral feature, order of spectral feature) variance matrix of source speaker’s spectral feature
-
covarXY
¶ array – shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrix of source and target speaker’s spectral feature
-
covarYX
¶ array – shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrix of target and source speaker’s spectral feature
-
covarYY
¶ array – shape (num_mixtures, order of spectral feature, order of spectral feature) variance matrix of target speaker’s spectral feature
-
D
¶ array – shape (num_mixtures, order of spectral feature, order of spectral feature) covariance matrices of target static spectral features
-
px
¶ sklearn.mixture.GaussianMixture – Gaussian Mixture Models of source speaker’s features
Examples
>>> from sklearn.mixture import GaussianMixture >>> from nnmnkwii.baseline.gmm import MLPG >>> import numpy as np >>> static_dim, T = 24, 10 >>> windows = [ ... (0, 0, np.array([1.0])), ... (1, 1, np.array([-0.5, 0.0, 0.5])), ... (1, 1, np.array([1.0, -2.0, 1.0])), ... ] >>> src = np.random.rand(T, static_dim * len(windows)) >>> tgt = np.random.rand(T, static_dim * len(windows)) >>> XY = np.concatenate((src, tgt), axis=-1) # pseudo parallel data >>> gmm = GaussianMixture(n_components=4) >>> _ = gmm.fit(XY) >>> paramgen = MLPG(gmm, windows=windows) >>> generated = paramgen.transform(src) >>> assert generated.shape == (T, static_dim)
[R2] (1, 2) [Toda 2007] Voice Conversion Based on Maximum Likelihood Estimation of Spectral Parameter Trajectory. -
transform
(src)[source]¶ Mapping source feature x to target feature y so that maximize the likelihood of y given x.
Parameters: src (array) – shape (the number of frames, the order of spectral feature) a sequence of source speaker’s spectral feature that will be transformed. Returns: a sequence of transformed features Return type: array
- Source speaker’s feature: