IO

IO operations for some speech-specific file formats.

  • HTS-style full-context label file (a.k.a. HTK alignment)

  • HTS-style question file

HTS IO

load([path, lines])

Load HTS-style label file

load_question_set(qs_file_name[, …])

Load HTS-style question and convert it to binary/continuous feature extraction regexes.

write_audacity_labels(dst_path, labels)

Write audacity labels from HTS-style labels

write_textgrid(dst_path, labels)

Write TextGrid from HTS-style labels

class nnmnkwii.io.hts.HTSLabelFile(frame_shift=50000)[source]

Memory representation for HTS-style context labels (a.k.a HTK alignment).

Indexing is supported. It returns tuple of (start_time, end_time, label).

start_times

Start times in 100ns units.

Type

list

end_times

End times in 100ns units.

Type

list

contexts

Contexts. Each value should have either phone or full-context annotation.

Type

list

Examples

Load from file

>>> from nnmnkwii.io import hts
>>> from nnmnkwii.util import example_label_file
>>> labels = hts.load(example_label_file())
>>> print(labels[0])
(0, 50000, 'x^x-sil+hh=iy@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=2|0/I:4=3/J:13+9-2[2]')

Create memory representation of label

>>> labels = hts.HTSLabelFile()
>>> labels.append((0, 3125000, "silB"))
0 3125000 silB
>>> labels.append((3125000, 3525000, "m"))
0 3125000 silB
3125000 3525000 m
>>> labels.append((3525000, 4325000, "i"))
0 3125000 silB
3125000 3525000 m
3525000 4325000 i

Save to file

>>> from tempfile import TemporaryFile
>>> with TemporaryFile("w") as f:
...     f.write(str(labels))
50
append(label, strict=True)[source]

Append a single alignment label

Parameters
  • label (tuple) – tuple of (start_time, end_time, context).

  • strict (bool) – strict mode.

Returns

self

Raises
  • ValueError – if start_time >= end_time

  • ValueError – if last end time doesn’t match start_time

load(path=None, lines=None)[source]

Load labels from file

Parameters
  • path (str) – File path

  • lines (list) – Content of label file. If not None, construct HTSLabelFile directry from it instead of loading a file.

Raises

ValueError – if the content of labels is empty.

num_states()[source]

Returnes number of states exclusing special begin/end states.

set_durations(durations, frame_shift=50000)[source]

Set start/end times from duration features

Todo

this should be refactored

silence_frame_indices(regex=None, frame_shift=50000)[source]

Returns silence frame indices

Similar to silence_label_indices(), but returns indices in frame-level.

Parameters

regex (re(optional)) – Compiled regex to find silence labels.

Returns

Silence frame indices

Return type

1darray

silence_label_indices(regex=None)[source]

Returns silence label indices

Parameters

regex (re(optional)) – Compiled regex to find silence labels.

Returns

Silence label indices

Return type

1darray

silence_phone_indices(regex=None)[source]

Returns phone-level frame indices

Parameters

regex (re(optional)) – Compiled regex to find silence labels.

Returns

Silence label indices

Return type

1darray