nnmnkwii.functions.mlpg

nnmnkwii.functions.mlpg(mean_frames, variance_frames, windows)[source]

Maximum Parameter Likelihood Generation (MLPG)

Function f: (T, D) -> (T, static_dim).

It peforms Maximum Likelihood Parameter Generation (MLPG) algorithm to generate static features from static + dynamic features over time frames. The implementation was heavily inspired by [R5] and using bandmat for efficient computation.

[R5]M. Shannon, supervised by W. Byrne (2014), Probabilistic acoustic modelling for parametric speech synthesis PhD thesis, University of Cambridge, UK
Parameters:
  • mean_frames (2darray) – The input features (static + delta). In statistical speech synthesis, these are means of gaussian distributions predicted by neural networks or decision trees.
  • variance_frames (2d or 1darray) – Variances (static + delta ) of gaussian distributions over time frames (2d) or global variances (1d). If global variances are given, these will get expanded over frames.
  • windows (list) – A sequence of (l, u, win_coeff) triples, where l and u are non-negative integers specifying the left and right extents of the window and win_coeff is an array specifying the window coefficients.
Returns:

Generated static features over time

Examples

>>> from nnmnkwii import functions as F
>>> windows = [
...         (0, 0, np.array([1.0])),            # static
...         (1, 1, np.array([-0.5, 0.0, 0.5])), # delta
...         (1, 1, np.array([1.0, -2.0, 1.0])), # delta-delta
...     ]
>>> T, static_dim = 10, 24
>>> mean_frames = np.random.rand(T, static_dim * len(windows))
>>> variance_frames = np.random.rand(T, static_dim * len(windows))
>>> static_features = F.mlpg(mean_frames, variance_frames, windows)
>>> assert static_features.shape == (T, static_dim)