Datasets ======== .. automodule:: nnmnkwii.datasets This module provides dataset abstraction. In this library, a dataset represents fixed-sized set of features (e.g., acoustic features, linguistic features, duration features etc.) composed of multiple utterances, supporting iteration and indexing. Interface ---------- To build dataset and represent variety of features (linguistic, duration, acoustic, etc) in an unified way, we define couple of interfaces. 1. :obj:`FileDataSource` 2. :obj:`Dataset` The former is an abstraction of file data sources, where we find the data and how to process them. Any FileDataSource must implement: - ``collect_files``: specifies where to find source files (wav, lab, cmp, bin, etc.). - ``collect_features``: specifies how to collect features (just load from file, or do some feature extraction logic, etc). The later is an abstraction of dataset. Any dataset must implement :obj:`Dataset` interface: - ``__getitem__``: returns features (typically, two dimentional :obj:`numpy.ndarray`) - ``__len__``: returns the size of dataset (e.g., number of utterances). One important point is that we use :obj:`numpy.ndarray` to represent features (there might be exception though). For example, - F0 trajecoty as ``T x 1`` array, where ``T`` represents number of frames. - Spectrogram as ``T x D`` array, where ``D`` is number of feature dimention. - Linguistic features as ``T x D`` array. .. autoclass:: FileDataSource :members: .. autoclass:: Dataset :members: Implementation -------------- With combination of :obj:`FileDataSource` and :obj:`Dataset`, we define some dataset implementation that can be used for typical situations. .. note:: Note that we don't provide special iterator implementation (e.g., mini-batch iteration, multiprocessing, etc). Users are expected to use dataset with other iterator implementation. For PyTorch users, we can use `PyTorch DataLoader`_ for mini-batch iteration and multiprocessing. Our dataset interface is `exactly` same as PyTorch's one, so we can use PyTorch DataLoader seamlessly. See tutorials how we can use it practically. .. _PyTorch DataLoader: Dataset that supports utterance-wise iteration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. autoclass:: FileSourceDataset :members: .. autoclass:: PaddedFileSourceDataset :members: .. autoclass:: MemoryCacheDataset :members: Dataset that supports frame-wise iteration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. autoclass:: MemoryCacheFramewiseDataset :members: Builtin data sources -------------------- There are a couple of builtin file data sources for typical datasets to make it easy to work on those. With the following data source implementation, you only need to implement ``collect_features``, which defines what features you want from wav file or text (depends on data source). If you want maximum flexibility to access dataset, you may want to implement your own data source, instead of using bulitin ones. e.g. If we are trying to extract acoustic features from wav files from CMU Arctic, then you can write: .. code-block:: python from nnmnkwii.preprocessing import trim_zeros_frames from nnmnkwii.datasets import FileSourceDataset from nnmnkwii.datasets import cmu_arctic import pysptk import pyworld class MyFileDataSource(cmu_arctic.WavFileDataSource): def __init__(self, data_root, speakers, max_files=100): super(MyFileDataSource, self).__init__( data_root, speakers, max_files=100) def collect_features(self, path): """Compute mel-cepstrum given a wav file.""" fs, x = x = x.astype(np.float64) f0, timeaxis = pyworld.dio(x, fs, frame_period=5) f0 = pyworld.stonemask(x, f0, timeaxis, fs) spectrogram = pyworld.cheaptrick(x, f0, timeaxis, fs) spectrogram = trim_zeros_frames(spectrogram) mc = pysptk.sp2mc(spectrogram, order=24, alpha=0.41) return mc.astype(np.float32) DATA_ROOT = "/home/ryuichi/data/cmu_arctic/" # your data path data_source = MyFileDataSource(DATA_DIR, speakers=["clb"], max_files=100) # 100 wav files of `clb` speaker will be collected X = FileSourceDataset(data_source) assert len(X) == 100 for x in X: # do anything on acoustic features (e.g., save to disk) pass More real examples can be found in `tests directory`_ in nnmnkwii and tutorial notebooks in `nnmnkwii_gallery`_. .. _`tests directory`: .. _`nnmnkwii_gallery`: CMU Arctic (en) --------------- You can download data from .. autoclass:: nnmnkwii.datasets.cmu_arctic.WavFileDataSource :members: VCTK (en) --------- You can download data (15GB) from .. note:: Note that VCTK data sources don't collect files for speaker ``315``, since there are no transcriptions available for ``315`` entries, .. autoclass:: nnmnkwii.datasets.vctk.TranscriptionDataSource :members: .. autoclass:: nnmnkwii.datasets.vctk.WavFileDataSource :members: LJ-Speech (en) -------------- You can download data (2.6GB) from .. autoclass:: nnmnkwii.datasets.ljspeech.TranscriptionDataSource :members: .. autoclass:: nnmnkwii.datasets.ljspeech.WavFileDataSource :members: Voice Conversion Challenge (VCC) 2016 (en) ------------------------------------------ You can download training data (181MB) and evaluation data (~56 MB) from .. autoclass:: nnmnkwii.datasets.vcc2016.WavFileDataSource :members: Voice statistics (ja) --------------------- You can download data (~720MB) from .. autoclass:: nnmnkwii.datasets.voice_statistics.TranscriptionDataSource :members: .. autoclass:: nnmnkwii.datasets.voice_statistics.WavFileDataSource :members: JSUT (ja) --------- JSUT (Japanese speech corpus of Saruwatari Lab, University of Tokyo). You can download data (2.7GB) from .. autoclass:: nnmnkwii.datasets.jsut.TranscriptionDataSource :members: .. autoclass:: nnmnkwii.datasets.jsut.WavFileDataSource :members: JVS (ja) --------- JVS: free Japanese multi-speaker voice corpus You can download data from .. autoclass:: nnmnkwii.datasets.jvs.TranscriptionDataSource :members: .. autoclass:: nnmnkwii.datasets.jvs.WavFileDataSource :members: