nnmnkwii.preprocessing.modspec_smoothing

nnmnkwii.preprocessing.modspec_smoothing(x, modfs, n=4096, norm=None, cutoff=50, log_domain=True)[source]

Parameter trajectory smoothing by removing high frequency bands of MS.

Given an parameter trajectory, it removes high frequency bands of its modulation spectrum (MS).

It’s known that the effect of the MS components in high MS frequency bands on quality of analysis-synthesized speech is negligible in HMM-based speech synthesis. See [1] for details.

1

Takamichi, Shinnosuke, et al. “The NAIST text-to-speech system for the Blizzard Challenge 2015.” Proc. Blizzard Challenge workshop. 2015.

Parameters
  • x (numpy.ndarray) – Parameter trajectory, shape (T x D).

  • modfs (int) – Sampling frequency in modulation spectrum domain. In frame-based processing, this will be fs / hop_length.

  • n (int) – DFT length

  • norm (str) – Normalization mode. See numpy.fft.fft().

  • cutoff (float) – Cut-off frequency in Hz.

  • log_domain (bool) – Whether it performs high frequency band removal on log modulation spectrum domain or not.

Returns

Smoothed parameter trajectory, shape (T x D).

Return type

numpy.ndarray

Examples

>>> import numpy as np
>>> from nnmnkwii import preprocessing as P
>>> generated = np.random.rand(10, 2)
>>> smoothed = P.modspec_smoothing(generated, modfs=200, n=16, cutoff=50)
>>> smoothed.shape
(10, 2)