IO

IO operations for some speech-specific file formats. As of now, it only supports read operations for:

  • HTS-style question file
  • HTS-style full-context label file

HTS IO

load(path[, frame_shift_in_micro_sec]) Load HTS-style label file
load_question_set(qs_file_name) Load HTS-style question and convert it to binary/continuous feature extraction regexes.
class nnmnkwii.io.hts.HTSLabelFile(frame_shift_in_micro_sec=50000)[source]

Memory representation for HTS-style context labels file.

Indexing is supported. It returns tuple of (start_time, end_time, label).

frame_shift_in_ms

int – Frame shift in micro seconds

start_times

ndarray – Start times

end_times

ndarray – End times

contexts

nadarray – Contexts.

Examples

>>> from nnmnkwii.io import hts
>>> from nnmnkwii.util import example_label_file
>>> labels = hts.load(example_label_file())
>>> print(labels[0])
(0, 50000, 'x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[2]')
load(path)[source]

Load labels from file

Parameters:path (str) – File path
num_states()[source]

Returnes number of states exclusing special begin/end states.

set_durations(durations)[source]

Set start/end times from duration features

Todo

this should be refactored

silence_frame_indices(regex=None)[source]

Returns silence frame indices

Similar to silence_label_indices(), but returns indices in frame-level.

Parameters:regex (re(optional)) – Compiled regex to find silence labels.
Returns:Silence frame indices
Return type:1darray
silence_label_indices(regex=None)[source]

Returns silence label indices

Parameters:regex (re(optional)) – Compiled regex to find silence labels.
Returns:Silence label indices
Return type:1darray
silence_phone_indices(regex=None)[source]

Returns phone-level frame indices

Parameters:regex (re(optional)) – Compiled regex to find silence labels.
Returns:Silence label indices
Return type:1darray