Autograd

Differenciable functions for PyTorch. This may be extended to support other autograd frameworks.

Currently all functions doesn’t have CUDA implementation, but should be addressed later.

Functional interface

mlpg(mean_frames, variance_frames, windows) Maximum Liklihood Paramter Generation (MLPG).
modspec(y[, n, norm]) Moduration spectrum computation.

Function classes

class nnmnkwii.autograd.MLPG(static_dim, variance_frames, windows)[source]

MLPG as an autograd function f : (T, D) -> (T, static_dim).

This is meant to be used for Minimum Geneartion Error (MGE) training for speech synthesis and voice conversion. See [R1] for details.

[R1]Wu, Zhizheng, and Simon King. “Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features.” INTERSPEECH. 2015.

Let \(d\) is the index of static features, \(l\) is the index of windows, gradients \(g_{d,l}\) can be computed by:

\[g_{d,l} = (\sum_{l} W_{l}^{T}P_{d,l}W_{l})^{-1} W_{l}^{T}P_{d,l}\]

where \(W_{l}\) is a banded window matrix and \(P_{d,l}\) is a diagonal precision matrix.

Assuming the variances are diagonals, MLPG can be performed in dimention-by-dimention efficiently.

Let \(o_{d}\) be T dimentional back-propagated gradients, the resulting gradients \(g'_{l,d}\) to be propagated are computed as follows:

\[g'_{d,l} = o_{d}^{T} g_{d,l}\]
static_dim

int – number of static dimentions

variance_frames

torch.FloatTensor – Variances same as in nnmnkwii.functions.mlpg().

windows

list – same as in nnmnkwii.functions.mlpg().

Todo

CUDA implementation

class nnmnkwii.autograd.ModSpec(n=2048, norm=None)[source]

Modulation spectrum computation f : (T, D) -> (N//2+1, D).

n

int – DFT length.

norm

bool – Normalize DFT output or not. See numpy.fft.fft.