IO

IO operations for some speech-specific file formats.

  • HTS-style full-context label file (a.k.a. HTK alignment)
  • HTS-style question file

HTS IO

load([path, lines]) Load HTS-style label file
load_question_set(qs_file_name) Load HTS-style question and convert it to binary/continuous feature extraction regexes.
class nnmnkwii.io.hts.HTSLabelFile(frame_shift_in_micro_sec=50000)[source]

Memory representation for HTS-style context labels (a.k.a HTK alignment).

Indexing is supported. It returns tuple of (start_time, end_time, label).

start_times

Start times in micro seconds.

Type:list
end_times

End times in micro seconds.

Type:list
contexts

Contexts. Each value should have either phone or full-context annotation.

Type:list

Examples

Load from file

>>> from nnmnkwii.io import hts
>>> from nnmnkwii.util import example_label_file
>>> labels = hts.load(example_label_file())
>>> print(labels[0])
(0, 50000, 'x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[2]')

Create memory representation of label

>>> labels = hts.HTSLabelFile()
>>> labels.append((0, 3125000, "silB"))
0 3125000 silB
>>> labels.append((3125000, 3525000, "m"))
0 3125000 silB
3125000 3525000 m
>>> labels.append((3525000, 4325000, "i"))
0 3125000 silB
3125000 3525000 m
3525000 4325000 i

Save to file

>>> from tempfile import TemporaryFile
>>> with TemporaryFile("w") as f:
...     f.write(str(labels))
50
append(label)[source]

Append a single alignment label

Parameters:

label (tuple) – tuple of (start_time, end_time, context).

Returns:

self

Raises:
  • ValueError – if start_time >= end_time
  • ValueError – if last end time doesn’t match start_time
load(path=None, lines=None)[source]

Load labels from file

Parameters:
  • path (str) – File path
  • lines (list) – Content of label file. If not None, construct HTSLabelFile directry from it instead of loading a file.
num_states()[source]

Returnes number of states exclusing special begin/end states.

set_durations(durations, frame_shift_in_micro_sec=50000)[source]

Set start/end times from duration features

Todo

this should be refactored

silence_frame_indices(regex=None, frame_shift_in_micro_sec=50000)[source]

Returns silence frame indices

Similar to silence_label_indices(), but returns indices in frame-level.

Parameters:regex (re(optional)) – Compiled regex to find silence labels.
Returns:Silence frame indices
Return type:1darray
silence_label_indices(regex=None)[source]

Returns silence label indices

Parameters:regex (re(optional)) – Compiled regex to find silence labels.
Returns:Silence label indices
Return type:1darray
silence_phone_indices(regex=None)[source]

Returns phone-level frame indices

Parameters:regex (re(optional)) – Compiled regex to find silence labels.
Returns:Silence label indices
Return type:1darray