Pre-processing

Feature transformation, feature alignment and feature normalization.

Generic

Utterance-wise operations

mulaw(x[, mu]) Mu-Law companding
inv_mulaw(y[, mu]) Inverse of mu-law companding (mu-law expansion)
mulaw_quantize(x[, mu]) Mu-Law companding + quantize
inv_mulaw_quantize(y[, mu]) Inverse of mu-law companding + quantize
preemphasis(x[, coef]) Pre-emphasis
inv_preemphasis(x[, coef]) Inverse operation of pre-emphasis
delta_features(x, windows) Compute delta features and combine them.
trim_zeros_frames(x[, eps]) Remove trailling zeros frames.
remove_zeros_frames(x[, eps]) Remove zeros frames.
adjust_frame_length(x[, pad, divisible_by]) Adjust frame length given a feature vector or matrix.
adjust_frame_lengths(x, y[, pad, …]) Adjust frame lengths given two feature vectors or matrices.
scale(x, data_mean, data_std) Mean/variance scaling.
inv_scale(x, data_mean, data_std) Inverse tranform of mean/variance scaling.
minmax_scale_params(data_min, data_max[, …]) Compute parameters required to perform min/max scaling.
minmax_scale(x[, data_min, data_max, …]) Min/max scaling for given a single data.
inv_minmax_scale(x[, data_min, data_max, …]) Inverse transform of min/max scaling for given a single data.
modspec(x[, n, norm, return_phase]) Modulation spectrum (MS) computation
inv_modspec(ms, phase[, norm]) Inverse transform of modulation spectrum computation
modspec_smoothing(x, modfs[, n, norm, …]) Parameter trajectory smoothing by removing high frequency bands of MS.

Dataset-wise operations

meanvar(dataset[, lengths, mean_, var_, …]) Mean/variance computation given a iterable dataset
meanstd(dataset[, lengths, mean_, var_, …]) Mean/std-deviation computation given a iterable dataset
minmax(dataset[, lengths]) Min/max computation given a iterable dataset

F0

F0-specific pre-processsing algorithms.

interp1d(f0[, kind]) Coutinuous F0 interpolation from discontinuous F0 trajectory

Alignment

Alignment algorithms. This is typically useful for creating parallel data in statistical voice conversion.

Currently, there are only high-level APIs that takes input as tuple of unnormalized padded data arrays (N x T x D) and returns padded aligned arrays with the same shape. If you are interested in aligning single pair of feature matrix (not dataset), then use fastdtw directly instead.

class nnmnkwii.preprocessing.alignment.DTWAligner(dist=<function DTWAligner.<lambda>>, radius=1, verbose=0)[source]

Align feature matrices using fastdtw.

dist

function – Distance function. Default is numpy.linalg.norm().

radius

int – Radius parameter in fastdtw.

verbose

int – Verbose flag. Default is 0.

Examples

>>> from nnmnkwii.util import example_file_data_sources_for_duration_model
>>> from nnmnkwii.datasets import FileSourceDataset
>>> from nnmnkwii.preprocessing.alignment import DTWAligner
>>> _, X = example_file_data_sources_for_duration_model()
>>> X = FileSourceDataset(X).asarray()
>>> X.shape
(3, 40, 5)
>>> Y = X.copy()
>>> X_aligned, Y_aligned = DTWAligner().transform((X, Y))
>>> X_aligned.shape
(3, 40, 5)
>>> Y_aligned.shape
(3, 40, 5)
class nnmnkwii.preprocessing.alignment.IterativeDTWAligner(n_iter=3, dist=<function IterativeDTWAligner.<lambda>>, radius=1, max_iter_gmm=100, n_components_gmm=16, verbose=0)[source]

Align feature matrices iteratively using GMM-based feature conversion.

n_iter

int – Number of iterations.

dist

function – Distance function

radius

int – Radius parameter in fastdtw.

verbose

int – Verbose flag. Default is 0.

max_iter_gmm

int – Maximum iteration to train GMM.

n_components_gmm

int – Number of mixture components in GMM.

Examples

>>> from nnmnkwii.util import example_file_data_sources_for_duration_model
>>> from nnmnkwii.datasets import FileSourceDataset
>>> from nnmnkwii.preprocessing.alignment import IterativeDTWAligner
>>> _, X = example_file_data_sources_for_duration_model()
>>> X = FileSourceDataset(X).asarray()
>>> X.shape
(3, 40, 5)
>>> Y = X.copy()
>>> X_aligned, Y_aligned = IterativeDTWAligner(n_iter=1).transform((X, Y))
>>> X_aligned.shape
(3, 40, 5)
>>> Y_aligned.shape
(3, 40, 5)