Pre-processing¶
Feature transformation, feature alignment and feature normalization.
Generic¶
Utterance-wise operations¶
mulaw(x[, mu]) |
Mu-Law companding |
inv_mulaw(y[, mu]) |
Inverse of mu-law companding (mu-law expansion) |
mulaw_quantize(x[, mu]) |
Mu-Law companding + quantize |
inv_mulaw_quantize(y[, mu]) |
Inverse of mu-law companding + quantize |
preemphasis(x[, coef]) |
Pre-emphasis |
inv_preemphasis(x[, coef]) |
Inverse operation of pre-emphasis |
delta_features(x, windows) |
Compute delta features and combine them. |
trim_zeros_frames(x[, eps]) |
Remove trailling zeros frames. |
remove_zeros_frames(x[, eps]) |
Remove zeros frames. |
adjust_frame_length(x[, pad, divisible_by]) |
Adjust frame length given a feature vector or matrix. |
adjust_frame_lengths(x, y[, pad, …]) |
Adjust frame lengths given two feature vectors or matrices. |
scale(x, data_mean, data_std) |
Mean/variance scaling. |
inv_scale(x, data_mean, data_std) |
Inverse tranform of mean/variance scaling. |
minmax_scale_params(data_min, data_max[, …]) |
Compute parameters required to perform min/max scaling. |
minmax_scale(x[, data_min, data_max, …]) |
Min/max scaling for given a single data. |
inv_minmax_scale(x[, data_min, data_max, …]) |
Inverse transform of min/max scaling for given a single data. |
modspec(x[, n, norm, return_phase]) |
Modulation spectrum (MS) computation |
inv_modspec(ms, phase[, norm]) |
Inverse transform of modulation spectrum computation |
modspec_smoothing(x, modfs[, n, norm, …]) |
Parameter trajectory smoothing by removing high frequency bands of MS. |
F0¶
F0-specific pre-processsing algorithms.
interp1d(f0[, kind]) |
Coutinuous F0 interpolation from discontinuous F0 trajectory |
Alignment¶
Alignment algorithms. This is typically useful for creating parallel data in statistical voice conversion.
Currently, there are only high-level APIs that takes input as tuple of
unnormalized padded data arrays (N x T x D)
and returns padded aligned arrays with the same shape. If you are interested
in aligning single pair of feature matrix (not dataset), then use fastdtw
directly instead.
-
class
nnmnkwii.preprocessing.alignment.DTWAligner(dist=<function DTWAligner.<lambda>>, radius=1, verbose=0)[source]¶ Align feature matrices using fastdtw.
-
dist¶ function – Distance function. Default is
numpy.linalg.norm().
-
verbose¶ int – Verbose flag. Default is 0.
Examples
>>> from nnmnkwii.util import example_file_data_sources_for_duration_model >>> from nnmnkwii.datasets import FileSourceDataset >>> from nnmnkwii.preprocessing.alignment import DTWAligner >>> _, X = example_file_data_sources_for_duration_model() >>> X = FileSourceDataset(X).asarray() >>> X.shape (3, 40, 5) >>> Y = X.copy() >>> X_aligned, Y_aligned = DTWAligner().transform((X, Y)) >>> X_aligned.shape (3, 40, 5) >>> Y_aligned.shape (3, 40, 5)
-
-
class
nnmnkwii.preprocessing.alignment.IterativeDTWAligner(n_iter=3, dist=<function IterativeDTWAligner.<lambda>>, radius=1, max_iter_gmm=100, n_components_gmm=16, verbose=0)[source]¶ Align feature matrices iteratively using GMM-based feature conversion.
-
n_iter¶ int – Number of iterations.
-
dist¶ function – Distance function
-
verbose¶ int – Verbose flag. Default is 0.
-
max_iter_gmm¶ int – Maximum iteration to train GMM.
-
n_components_gmm¶ int – Number of mixture components in GMM.
Examples
>>> from nnmnkwii.util import example_file_data_sources_for_duration_model >>> from nnmnkwii.datasets import FileSourceDataset >>> from nnmnkwii.preprocessing.alignment import IterativeDTWAligner >>> _, X = example_file_data_sources_for_duration_model() >>> X = FileSourceDataset(X).asarray() >>> X.shape (3, 40, 5) >>> Y = X.copy() >>> X_aligned, Y_aligned = IterativeDTWAligner(n_iter=1).transform((X, Y)) >>> X_aligned.shape (3, 40, 5) >>> Y_aligned.shape (3, 40, 5)
-