IO¶

IO operations for some speech-specific file formats.

HTS-style full-context label file (a.k.a. HTK alignment)
HTS-style question file

HTS IO¶

`load`([path, lines])	Load HTS-style label file
`load_question_set`(qs_file_name[, …])	Load HTS-style question and convert it to binary/continuous feature extraction regexes.
`write_audacity_labels`(dst_path, labels)	Write audacity labels from HTS-style labels
`write_textgrid`(dst_path, labels)	Write TextGrid from HTS-style labels

class nnmnkwii.io.hts.HTSLabelFile(frame_shift=50000)[source]¶

Memory representation for HTS-style context labels (a.k.a HTK alignment).

Indexing is supported. It returns tuple of (start_time, end_time, label).

start_times¶

Start times in 100ns units.

Type: list

end_times¶

End times in 100ns units.

Type: list

contexts¶

Contexts. Each value should have either phone or full-context annotation.

Type: list

Examples

Load from file

>>> from nnmnkwii.io import hts
>>> from nnmnkwii.util import example_label_file
>>> labels = hts.load(example_label_file())
>>> print(labels[0])
(0, 50000, 'x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[2]')

Create memory representation of label

>>> labels = hts.HTSLabelFile()
>>> labels.append((0, 3125000, "silB"))
0 3125000 silB
>>> labels.append((3125000, 3525000, "m"))
0 3125000 silB
3125000 3525000 m
>>> labels.append((3525000, 4325000, "i"))
0 3125000 silB
3125000 3525000 m
3525000 4325000 i

Save to file

>>> from tempfile import TemporaryFile
>>> with TemporaryFile("w") as f:
...     f.write(str(labels))
50

append(label, strict=True)[source]¶

Append a single alignment label

Parameters

label (tuple) – tuple of (start_time, end_time, context).
strict (bool) – strict mode.

Returns

self

Raises

ValueError – if start_time >= end_time
ValueError – if last end time doesn’t match start_time

load(path=None, lines=None)[source]¶

Load labels from file

Parameters

path (str) – File path
lines (list) – Content of label file. If not None, construct HTSLabelFile directry from it instead of loading a file.

Raises

ValueError – if the content of labels is empty.

num_states()[source]¶: Returnes number of states exclusing special begin/end states.

set_durations(durations, frame_shift=50000)[source]¶: Set start/end times from duration features

Todo

this should be refactored

silence_frame_indices(regex=None, frame_shift=50000)[source]¶

Returns silence frame indices

Similar to silence_label_indices(), but returns indices in frame-level.

Parameters: regex (re(optional)) – Compiled regex to find silence labels.
Returns: Silence frame indices
Return type: 1darray

silence_label_indices(regex=None)[source]¶

Returns silence label indices

Parameters: regex (re(optional)) – Compiled regex to find silence labels.
Returns: Silence label indices
Return type: 1darray

silence_phone_indices(regex=None)[source]¶

Returns phone-level frame indices

Parameters: regex (re(optional)) – Compiled regex to find silence labels.
Returns: Silence label indices
Return type: 1darray