0.1.0
Notes
General design documentation
The underlying design philosophy
Background
Goal
So what do we provide?
Design decisions
Development guidelines
設計ドキュメント (Japanese)
The underlying design philosophy
Background
Goal
So what do we provide?
Design decisions
Development guidelines
A quick start guide
Playing with audio and it’s alignment file
Load wav file
Acoustic features
Load aligment file
Cut silence frames
Linguistic features
Playing with datasets
Get example file sources
Load data
Utterance-wise iteration
Memory cache iteration
Frame-wise iteration
Tutorials
DNN text-to-speech synthesis (en)
Data
Data specification
File data sources
Utterance lengths
How data look like?
Statistics
Combine datasets and normalization.
Model
Train
Configurations
Training loop
Define models
Training Duration model
Training acoustic model
Test
Parameter generation utilities
Listen generated audio
Bidirectional-LSTM based RNNs for text-to-speech synthesis (en)
Data
Data specification
File data sources
Utterance lengths
How data look like?
Statistics
Combine datasets and normalization.
Model
Train
Configurations
Trainining loop
Define models
Training Duration model
Training acoustic model
Test
Parameter generation utilities
Listen generated audio
Bidirectional-LSTM based RNNs for text-to-speech synthesis with OpenJTalk (ja)
Data
Data specification
File data sources
Utterance lengths
How data look like?
Statistics
Combine datasets and normalization.
Model
Train
Configurations
Trainining loop
Define models
Training Duration model
Training acoustic model
Test
Parameter generation utilities
Listen generated audio
TTS using OpenJTalk frontend
GMM-based voice conversion (en)
Data
Data specification
File data sources
Convert dataset to arrays
How data look like?
Align source and target features
How parallel data look like?
Append delta features
Finally, we get joint feature matrix
Model
Visualize model
Means
Covariances
Test
Listen results
How different?
Package references
Autograd
Functional interface
nnmnkwii.autograd.mlpg
nnmnkwii.autograd.unit_variance_mlpg
nnmnkwii.autograd.modspec
Function classes
Baseline
GMM voice conversion
Datasets
Interface
Implementation
Dataset that supports utterance-wise iteration
Dataset that supports frame-wise iteration
Builtin data sources
CMU Arctic (en)
VCTK (en)
LJ-Speech (en)
Voice Conversion Challenge (VCC) 2016 (en)
Voice statistics (ja)
JSUT (ja)
JVS (ja)
Frontend
Merlin frontend
nnmnkwii.frontend.merlin.linguistic_features
nnmnkwii.frontend.merlin.duration_features
Functions
IO
HTS IO
nnmnkwii.io.hts.load
nnmnkwii.io.hts.load_question_set
nnmnkwii.io.hts.write_audacity_labels
nnmnkwii.io.hts.write_textgrid
Evaluation metrics
nnmnkwii.metrics.melcd
nnmnkwii.metrics.mean_squared_error
nnmnkwii.metrics.lf0_mean_squared_error
nnmnkwii.metrics.vuv_error
Parameter generation
nnmnkwii.paramgen.build_win_mats
nnmnkwii.paramgen.mlpg
nnmnkwii.paramgen.mlpg_grad
nnmnkwii.paramgen.unit_variance_mlpg_matrix
nnmnkwii.paramgen.reshape_means
Post-filters
nnmnkwii.postfilters.merlin_post_filter
Pre-processing
Generic
Utterance-wise operations
nnmnkwii.preprocessing.mulaw
nnmnkwii.preprocessing.inv_mulaw
nnmnkwii.preprocessing.mulaw_quantize
nnmnkwii.preprocessing.inv_mulaw_quantize
nnmnkwii.preprocessing.preemphasis
nnmnkwii.preprocessing.inv_preemphasis
nnmnkwii.preprocessing.delta_features
nnmnkwii.preprocessing.trim_zeros_frames
nnmnkwii.preprocessing.remove_zeros_frames
nnmnkwii.preprocessing.adjust_frame_length
nnmnkwii.preprocessing.adjust_frame_lengths
nnmnkwii.preprocessing.scale
nnmnkwii.preprocessing.inv_scale
nnmnkwii.preprocessing.minmax_scale_params
nnmnkwii.preprocessing.minmax_scale
nnmnkwii.preprocessing.inv_minmax_scale
nnmnkwii.preprocessing.modspec
nnmnkwii.preprocessing.inv_modspec
nnmnkwii.preprocessing.modspec_smoothing
Dataset-wise operations
nnmnkwii.preprocessing.meanvar
nnmnkwii.preprocessing.meanstd
nnmnkwii.preprocessing.minmax
F0
nnmnkwii.preprocessing.f0.interp1d
Alignment
Utilities
Function utilities
nnmnkwii.util.apply_each2d_padded
nnmnkwii.util.apply_each2d_trim
Files
nnmnkwii.util.example_label_file
nnmnkwii.util.example_audio_file
nnmnkwii.util.example_question_file
nnmnkwii.util.example_file_data_sources_for_duration_model
nnmnkwii.util.example_file_data_sources_for_acoustic_model
Linear algebra
nnmnkwii.util.linalg.cholesky_inv
nnmnkwii.util.linalg.cholesky_inv_banded
Meta information
Change log
v0.1.0 <2021-08-11>
v0.0.23 <2021-05-15>
v0.0.22 <2020-12-25>
v0.0.21 <2020-08-13>
v0.0.20 <2020-03-02>
v0.0.19 <2019-07-06>
v0.0.18 <2019-05-31>
v0.0.17 <2018-12-25>
v0.0.16 <2018-08-23>
v0.0.15 <2018-07-12>
v0.0.14 <2018-06-06>
v0.0.13 <2018-01-24>
v0.0.12 <2018-01-04>
v0.0.11 <2017-12-22>
v0.0.10 <2017-12-05>
v0.0.9 <2017-11-14>
v0.0.8 <2017-10-25>
v0.0.7 <2017-10-09>
v0.0.6 <2017-10-01>
v0.0.5 <2017-09-19>
v0.0.4 <2017-09-01>
v0.0.3 <2017-08-26>
v0.0.2 <2017-08-18>
v0.0.1 <2017-08-14>
nnmnkwii
»
Index
Index
A
|
B
|
C
|
D
|
E
|
F
|
H
|
I
|
L
|
M
|
N
|
P
|
R
|
S
|
T
|
U
|
V
|
W
A
adjust_frame_length() (in module nnmnkwii.preprocessing)
adjust_frame_lengths() (in module nnmnkwii.preprocessing)
append() (nnmnkwii.io.hts.HTSLabelFile method)
apply_each2d_padded() (in module nnmnkwii.util)
apply_each2d_trim() (in module nnmnkwii.util)
asarray() (nnmnkwii.datasets.FileSourceDataset method)
(nnmnkwii.datasets.PaddedFileSourceDataset method)
B
backward() (nnmnkwii.autograd.MLPG static method)
(nnmnkwii.autograd.ModSpec static method)
(nnmnkwii.autograd.UnitVarianceMLPG static method)
build_win_mats() (in module nnmnkwii.paramgen)
C
cache_size (nnmnkwii.datasets.MemoryCacheDataset attribute)
(nnmnkwii.datasets.MemoryCacheFramewiseDataset attribute)
cached_utterances (nnmnkwii.datasets.MemoryCacheDataset attribute)
(nnmnkwii.datasets.MemoryCacheFramewiseDataset attribute)
cholesky_inv() (in module nnmnkwii.util.linalg)
cholesky_inv_banded() (in module nnmnkwii.util.linalg)
collect_features() (nnmnkwii.datasets.FileDataSource method)
collect_files() (nnmnkwii.datasets.cmu_arctic.WavFileDataSource method)
(nnmnkwii.datasets.FileDataSource method)
(nnmnkwii.datasets.jvs.TranscriptionDataSource method)
(nnmnkwii.datasets.jvs.WavFileDataSource method)
(nnmnkwii.datasets.ljspeech.TranscriptionDataSource method)
(nnmnkwii.datasets.ljspeech.WavFileDataSource method)
(nnmnkwii.datasets.vcc2016.WavFileDataSource method)
(nnmnkwii.datasets.vctk.TranscriptionDataSource method)
(nnmnkwii.datasets.vctk.WavFileDataSource method)
(nnmnkwii.datasets.voice_statistics.TranscriptionDataSource method)
(nnmnkwii.datasets.voice_statistics.WavFileDataSource method)
collected_files (nnmnkwii.datasets.FileSourceDataset attribute)
contexts (nnmnkwii.io.hts.HTSLabelFile attribute)
covarXX (nnmnkwii.baseline.gmm.MLPG attribute)
covarXY (nnmnkwii.baseline.gmm.MLPG attribute)
covarYX (nnmnkwii.baseline.gmm.MLPG attribute)
covarYY (nnmnkwii.baseline.gmm.MLPG attribute)
D
D (nnmnkwii.baseline.gmm.MLPG attribute)
Dataset (class in nnmnkwii.datasets)
dataset (nnmnkwii.datasets.MemoryCacheDataset attribute)
(nnmnkwii.datasets.MemoryCacheFramewiseDataset attribute)
delta_features() (in module nnmnkwii.preprocessing)
dist (nnmnkwii.preprocessing.alignment.DTWAligner attribute)
(nnmnkwii.preprocessing.alignment.IterativeDTWAligner attribute)
DTWAligner (class in nnmnkwii.preprocessing.alignment)
duration_features() (in module nnmnkwii.frontend.merlin)
E
end_times (nnmnkwii.io.hts.HTSLabelFile attribute)
example_audio_file() (in module nnmnkwii.util)
example_file_data_sources_for_acoustic_model() (in module nnmnkwii.util)
example_file_data_sources_for_duration_model() (in module nnmnkwii.util)
example_label_file() (in module nnmnkwii.util)
example_question_file() (in module nnmnkwii.util)
F
file_data_source (nnmnkwii.datasets.FileSourceDataset attribute)
(nnmnkwii.datasets.PaddedFileSourceDataset attribute)
FileDataSource (class in nnmnkwii.datasets)
FileSourceDataset (class in nnmnkwii.datasets)
forward() (nnmnkwii.autograd.MLPG static method)
(nnmnkwii.autograd.ModSpec static method)
(nnmnkwii.autograd.UnitVarianceMLPG static method)
H
HTSLabelFile (class in nnmnkwii.io.hts)
I
interp1d() (in module nnmnkwii.preprocessing.f0)
inv_minmax_scale() (in module nnmnkwii.preprocessing)
inv_modspec() (in module nnmnkwii.preprocessing)
inv_mulaw() (in module nnmnkwii.preprocessing)
inv_mulaw_quantize() (in module nnmnkwii.preprocessing)
inv_preemphasis() (in module nnmnkwii.preprocessing)
inv_scale() (in module nnmnkwii.preprocessing)
IterativeDTWAligner (class in nnmnkwii.preprocessing.alignment)
L
labels (nnmnkwii.datasets.cmu_arctic.WavFileDataSource attribute)
(nnmnkwii.datasets.jvs.TranscriptionDataSource attribute)
(nnmnkwii.datasets.jvs.WavFileDataSource attribute)
(nnmnkwii.datasets.vcc2016.WavFileDataSource attribute)
(nnmnkwii.datasets.vctk.TranscriptionDataSource attribute)
(nnmnkwii.datasets.vctk.WavFileDataSource attribute)
(nnmnkwii.datasets.voice_statistics.WavFileDataSource attribute)
lf0_mean_squared_error() (in module nnmnkwii.metrics)
linguistic_features() (in module nnmnkwii.frontend.merlin)
load() (in module nnmnkwii.io.hts)
(nnmnkwii.io.hts.HTSLabelFile method)
load_question_set() (in module nnmnkwii.io.hts)
M
max_iter_gmm (nnmnkwii.preprocessing.alignment.IterativeDTWAligner attribute)
mean_squared_error() (in module nnmnkwii.metrics)
meanstd() (in module nnmnkwii.preprocessing)
meanvar() (in module nnmnkwii.preprocessing)
melcd() (in module nnmnkwii.metrics)
MemoryCacheDataset (class in nnmnkwii.datasets)
MemoryCacheFramewiseDataset (class in nnmnkwii.datasets)
merlin_post_filter() (in module nnmnkwii.postfilters)
metadata (nnmnkwii.datasets.ljspeech.TranscriptionDataSource attribute)
(nnmnkwii.datasets.ljspeech.WavFileDataSource attribute)
minmax() (in module nnmnkwii.preprocessing)
minmax_scale() (in module nnmnkwii.preprocessing)
minmax_scale_params() (in module nnmnkwii.preprocessing)
MLPG (class in nnmnkwii.autograd)
(class in nnmnkwii.baseline.gmm)
mlpg() (in module nnmnkwii.autograd)
(in module nnmnkwii.paramgen)
mlpg_grad() (in module nnmnkwii.paramgen)
ModSpec (class in nnmnkwii.autograd)
modspec() (in module nnmnkwii.autograd)
(in module nnmnkwii.preprocessing)
modspec_smoothing() (in module nnmnkwii.preprocessing)
module
nnmnkwii.autograd
nnmnkwii.baseline.gmm
nnmnkwii.datasets
nnmnkwii.frontend.merlin
nnmnkwii.io.hts
nnmnkwii.metrics
nnmnkwii.paramgen
nnmnkwii.postfilters
nnmnkwii.preprocessing
nnmnkwii.preprocessing.alignment
nnmnkwii.preprocessing.f0
nnmnkwii.util
nnmnkwii.util.linalg
mulaw() (in module nnmnkwii.preprocessing)
mulaw_quantize() (in module nnmnkwii.preprocessing)
N
n_components_gmm (nnmnkwii.preprocessing.alignment.IterativeDTWAligner attribute)
n_iter (nnmnkwii.preprocessing.alignment.IterativeDTWAligner attribute)
nnmnkwii.autograd
module
nnmnkwii.baseline.gmm
module
nnmnkwii.datasets
module
nnmnkwii.frontend.merlin
module
nnmnkwii.io.hts
module
nnmnkwii.metrics
module
nnmnkwii.paramgen
module
nnmnkwii.postfilters
module
nnmnkwii.preprocessing
module
nnmnkwii.preprocessing.alignment
module
nnmnkwii.preprocessing.f0
module
nnmnkwii.util
module
nnmnkwii.util.linalg
module
num_mixtures (nnmnkwii.baseline.gmm.MLPG attribute)
num_states() (nnmnkwii.io.hts.HTSLabelFile method)
P
padded_length (nnmnkwii.datasets.PaddedFileSourceDataset attribute)
PaddedFileSourceDataset (class in nnmnkwii.datasets)
preemphasis() (in module nnmnkwii.preprocessing)
px (nnmnkwii.baseline.gmm.MLPG attribute)
R
radius (nnmnkwii.preprocessing.alignment.DTWAligner attribute)
(nnmnkwii.preprocessing.alignment.IterativeDTWAligner attribute)
remove_zeros_frames() (in module nnmnkwii.preprocessing)
reshape_means() (in module nnmnkwii.paramgen)
S
scale() (in module nnmnkwii.preprocessing)
set_durations() (nnmnkwii.io.hts.HTSLabelFile method)
silence_frame_indices() (nnmnkwii.io.hts.HTSLabelFile method)
silence_label_indices() (nnmnkwii.io.hts.HTSLabelFile method)
silence_phone_indices() (nnmnkwii.io.hts.HTSLabelFile method)
speaker_info (nnmnkwii.datasets.jvs.TranscriptionDataSource attribute)
(nnmnkwii.datasets.jvs.WavFileDataSource attribute)
(nnmnkwii.datasets.vctk.TranscriptionDataSource attribute)
(nnmnkwii.datasets.vctk.WavFileDataSource attribute)
src_means (nnmnkwii.baseline.gmm.MLPG attribute)
start_times (nnmnkwii.io.hts.HTSLabelFile attribute)
T
tgt_means (nnmnkwii.baseline.gmm.MLPG attribute)
TranscriptionDataSource (class in nnmnkwii.datasets.jsut)
(class in nnmnkwii.datasets.jvs)
(class in nnmnkwii.datasets.ljspeech)
(class in nnmnkwii.datasets.vctk)
(class in nnmnkwii.datasets.voice_statistics)
transform() (nnmnkwii.baseline.gmm.MLPG method)
trim_zeros_frames() (in module nnmnkwii.preprocessing)
U
unit_variance_mlpg() (in module nnmnkwii.autograd)
unit_variance_mlpg_matrix() (in module nnmnkwii.paramgen)
UnitVarianceMLPG (class in nnmnkwii.autograd)
V
verbose (nnmnkwii.preprocessing.alignment.DTWAligner attribute)
(nnmnkwii.preprocessing.alignment.IterativeDTWAligner attribute)
vuv_error() (in module nnmnkwii.metrics)
W
WavFileDataSource (class in nnmnkwii.datasets.cmu_arctic)
(class in nnmnkwii.datasets.jsut)
(class in nnmnkwii.datasets.jvs)
(class in nnmnkwii.datasets.ljspeech)
(class in nnmnkwii.datasets.vcc2016)
(class in nnmnkwii.datasets.vctk)
(class in nnmnkwii.datasets.voice_statistics)
weights (nnmnkwii.baseline.gmm.MLPG attribute)
write_audacity_labels() (in module nnmnkwii.io.hts)
write_textgrid() (in module nnmnkwii.io.hts)