Pre-processing¶
Feature transformation, feature alignment and feature normalization.
Generic¶
Utterance-wise operations¶
mulaw(x[, mu]) | 
Mu-Law companding | 
inv_mulaw(y[, mu]) | 
Inverse of mu-law companding (mu-law expansion) | 
mulaw_quantize(x[, mu]) | 
Mu-Law companding + quantize | 
inv_mulaw_quantize(y[, mu]) | 
Inverse of mu-law companding + quantize | 
preemphasis(x[, coef]) | 
Pre-emphasis | 
inv_preemphasis(x[, coef]) | 
Inverse operation of pre-emphasis | 
delta_features(x, windows) | 
Compute delta features and combine them. | 
trim_zeros_frames(x[, eps]) | 
Remove trailling zeros frames. | 
remove_zeros_frames(x[, eps]) | 
Remove zeros frames. | 
adjust_frame_length(x[, pad, divisible_by]) | 
Adjust frame length given a feature vector or matrix. | 
adjust_frame_lengths(x, y[, pad, …]) | 
Adjust frame lengths given two feature vectors or matrices. | 
scale(x, data_mean, data_std) | 
Mean/variance scaling. | 
inv_scale(x, data_mean, data_std) | 
Inverse tranform of mean/variance scaling. | 
minmax_scale_params(data_min, data_max[, …]) | 
Compute parameters required to perform min/max scaling. | 
minmax_scale(x[, data_min, data_max, …]) | 
Min/max scaling for given a single data. | 
inv_minmax_scale(x[, data_min, data_max, …]) | 
Inverse transform of min/max scaling for given a single data. | 
modspec(x[, n, norm, return_phase]) | 
Modulation spectrum (MS) computation | 
inv_modspec(ms, phase[, norm]) | 
Inverse transform of modulation spectrum computation | 
modspec_smoothing(x, modfs[, n, norm, …]) | 
Parameter trajectory smoothing by removing high frequency bands of MS. | 
F0¶
F0-specific pre-processsing algorithms.
interp1d(f0[, kind]) | 
Coutinuous F0 interpolation from discontinuous F0 trajectory | 
Alignment¶
Alignment algorithms. This is typically useful for creating parallel data in statistical voice conversion.
Currently, there are only high-level APIs that takes input as tuple of
unnormalized padded data arrays (N x T x D)
and returns padded aligned arrays with the same shape. If you are interested
in aligning single pair of feature matrix (not dataset), then use fastdtw
directly instead.
- 
class 
nnmnkwii.preprocessing.alignment.DTWAligner(dist=<function DTWAligner.<lambda>>, radius=1, verbose=0)[source]¶ Align feature matrices using fastdtw.
- 
dist¶ function – Distance function. Default is
numpy.linalg.norm().
- 
verbose¶ int – Verbose flag. Default is 0.
Examples
>>> from nnmnkwii.util import example_file_data_sources_for_duration_model >>> from nnmnkwii.datasets import FileSourceDataset >>> from nnmnkwii.preprocessing.alignment import DTWAligner >>> _, X = example_file_data_sources_for_duration_model() >>> X = FileSourceDataset(X).asarray() >>> X.shape (3, 40, 5) >>> Y = X.copy() >>> X_aligned, Y_aligned = DTWAligner().transform((X, Y)) >>> X_aligned.shape (3, 40, 5) >>> Y_aligned.shape (3, 40, 5)
- 
 
- 
class 
nnmnkwii.preprocessing.alignment.IterativeDTWAligner(n_iter=3, dist=<function IterativeDTWAligner.<lambda>>, radius=1, max_iter_gmm=100, n_components_gmm=16, verbose=0)[source]¶ Align feature matrices iteratively using GMM-based feature conversion.
- 
n_iter¶ int – Number of iterations.
- 
dist¶ function – Distance function
- 
verbose¶ int – Verbose flag. Default is 0.
- 
max_iter_gmm¶ int – Maximum iteration to train GMM.
- 
n_components_gmm¶ int – Number of mixture components in GMM.
Examples
>>> from nnmnkwii.util import example_file_data_sources_for_duration_model >>> from nnmnkwii.datasets import FileSourceDataset >>> from nnmnkwii.preprocessing.alignment import IterativeDTWAligner >>> _, X = example_file_data_sources_for_duration_model() >>> X = FileSourceDataset(X).asarray() >>> X.shape (3, 40, 5) >>> Y = X.copy() >>> X_aligned, Y_aligned = IterativeDTWAligner(n_iter=1).transform((X, Y)) >>> X_aligned.shape (3, 40, 5) >>> Y_aligned.shape (3, 40, 5)
-