Pre-processing¶
Feature transformation, feature alignment and feature normalization.
Generic¶
Utterance-wise operations¶
mulaw (x[, mu]) |
Mu-Law companding |
inv_mulaw (y[, mu]) |
Inverse of mu-law companding (mu-law expansion) |
mulaw_quantize (x[, mu]) |
Mu-Law companding + quantize |
inv_mulaw_quantize (y[, mu]) |
Inverse of mu-law companding + quantize |
preemphasis (x[, coef]) |
Pre-emphasis |
inv_preemphasis (x[, coef]) |
Inverse operation of pre-emphasis |
delta_features (x, windows) |
Compute delta features and combine them. |
trim_zeros_frames (x[, eps]) |
Remove trailling zeros frames. |
remove_zeros_frames (x[, eps]) |
Remove zeros frames. |
adjust_frame_length (x[, pad, divisible_by]) |
Adjust frame length given a feature vector or matrix. |
adjust_frame_lengths (x, y[, pad, …]) |
Adjust frame lengths given two feature vectors or matrices. |
scale (x, data_mean, data_std) |
Mean/variance scaling. |
inv_scale (x, data_mean, data_std) |
Inverse tranform of mean/variance scaling. |
minmax_scale_params (data_min, data_max[, …]) |
Compute parameters required to perform min/max scaling. |
minmax_scale (x[, data_min, data_max, …]) |
Min/max scaling for given a single data. |
inv_minmax_scale (x[, data_min, data_max, …]) |
Inverse transform of min/max scaling for given a single data. |
modspec (x[, n, norm, return_phase]) |
Modulation spectrum (MS) computation |
inv_modspec (ms, phase[, norm]) |
Inverse transform of modulation spectrum computation |
modspec_smoothing (x, modfs[, n, norm, …]) |
Parameter trajectory smoothing by removing high frequency bands of MS. |
F0¶
F0-specific pre-processsing algorithms.
interp1d (f0[, kind]) |
Coutinuous F0 interpolation from discontinuous F0 trajectory |
Alignment¶
Alignment algorithms. This is typically useful for creating parallel data in statistical voice conversion.
Currently, there are only high-level APIs that takes input as tuple of
unnormalized padded data arrays (N x T x D)
and returns padded aligned arrays with the same shape. If you are interested
in aligning single pair of feature matrix (not dataset), then use fastdtw
directly instead.
-
class
nnmnkwii.preprocessing.alignment.
DTWAligner
(dist=<function DTWAligner.<lambda>>, radius=1, verbose=0)[source]¶ Align feature matrices using fastdtw.
-
dist
¶ function – Distance function. Default is
numpy.linalg.norm()
.
-
verbose
¶ int – Verbose flag. Default is 0.
Examples
>>> from nnmnkwii.util import example_file_data_sources_for_duration_model >>> from nnmnkwii.datasets import FileSourceDataset >>> from nnmnkwii.preprocessing.alignment import DTWAligner >>> _, X = example_file_data_sources_for_duration_model() >>> X = FileSourceDataset(X).asarray() >>> X.shape (3, 40, 5) >>> Y = X.copy() >>> X_aligned, Y_aligned = DTWAligner().transform((X, Y)) >>> X_aligned.shape (3, 40, 5) >>> Y_aligned.shape (3, 40, 5)
-
-
class
nnmnkwii.preprocessing.alignment.
IterativeDTWAligner
(n_iter=3, dist=<function IterativeDTWAligner.<lambda>>, radius=1, max_iter_gmm=100, n_components_gmm=16, verbose=0)[source]¶ Align feature matrices iteratively using GMM-based feature conversion.
-
n_iter
¶ int – Number of iterations.
-
dist
¶ function – Distance function
-
verbose
¶ int – Verbose flag. Default is 0.
-
max_iter_gmm
¶ int – Maximum iteration to train GMM.
-
n_components_gmm
¶ int – Number of mixture components in GMM.
Examples
>>> from nnmnkwii.util import example_file_data_sources_for_duration_model >>> from nnmnkwii.datasets import FileSourceDataset >>> from nnmnkwii.preprocessing.alignment import IterativeDTWAligner >>> _, X = example_file_data_sources_for_duration_model() >>> X = FileSourceDataset(X).asarray() >>> X.shape (3, 40, 5) >>> Y = X.copy() >>> X_aligned, Y_aligned = IterativeDTWAligner(n_iter=1).transform((X, Y)) >>> X_aligned.shape (3, 40, 5) >>> Y_aligned.shape (3, 40, 5)
-