IO¶

IO operations for some speech-specific file formats.

HTS-style full-context label file (a.k.a. HTK alignment)
HTS-style question file

HTS IO¶

`load`([path, lines])	Load HTS-style label file
`load_question_set`(qs_file_name)	Load HTS-style question and convert it to binary/continuous feature extraction regexes.

class nnmnkwii.io.hts.HTSLabelFile(frame_shift_in_micro_sec=50000)[source]¶

Memory representation for HTS-style context labels (a.k.a HTK alignment).

Indexing is supported. It returns tuple of (start_time, end_time, label).

start_times¶: list – Start times in micro seconds.

end_times¶: list – End times in micro seconds.

contexts¶: list – Contexts. Each value should have either phone or full-context annotation.

Examples

Load from file

>>> from nnmnkwii.io import hts
>>> from nnmnkwii.util import example_label_file
>>> labels = hts.load(example_label_file())
>>> print(labels[0])
(0, 50000, 'x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[2]')

Create memory representation of label

>>> labels = hts.HTSLabelFile()
>>> labels.append((0, 3125000, "silB"))
0 3125000 silB
>>> labels.append((3125000, 3525000, "m"))
0 3125000 silB
3125000 3525000 m
>>> labels.append((3525000, 4325000, "i"))
0 3125000 silB
3125000 3525000 m
3525000 4325000 i

Save to file

>>> from tempfile import TemporaryFile
>>> with TemporaryFile("w") as f:
...     f.write(str(labels))
50

append(label)[source]¶

Append a single alignment label

Parameters:	label (tuple) – tuple of (start_time, end_time, context).
Returns:	self
Raises:	`ValueError` – if start_time >= end_time `ValueError` – if last end time doesn’t match start_time

load(path=None, lines=None)[source]¶

Load labels from file

Parameters:	path (str) – File path lines (list) – Content of label file. If not None, construct HTSLabelFile directry from it instead of loading a file.

num_states()[source]¶: Returnes number of states exclusing special begin/end states.

set_durations(durations, frame_shift_in_micro_sec=50000)[source]¶: Set start/end times from duration features

Todo

this should be refactored

silence_frame_indices(regex=None, frame_shift_in_micro_sec=50000)[source]¶

Returns silence frame indices

Similar to silence_label_indices(), but returns indices in frame-level.

Parameters:	regex (re(optional)) – Compiled regex to find silence labels.
Returns:	Silence frame indices
Return type:	1darray

silence_label_indices(regex=None)[source]¶

Returns silence label indices

Parameters:	regex (re(optional)) – Compiled regex to find silence labels.
Returns:	Silence label indices
Return type:	1darray

silence_phone_indices(regex=None)[source]¶

Returns phone-level frame indices

Parameters:	regex (re(optional)) – Compiled regex to find silence labels.
Returns:	Silence label indices
Return type:	1darray