nnmnkwii.frontend.merlin.linguistic_features¶

nnmnkwii.frontend.merlin.linguistic_features(hts_labels, *args, **kwargs)[source]¶

Linguistic features from HTS-style full-context labels.

This converts HTS-style full-context labels to it’s numeric representation given feature extraction regexes which should be constructed from HTS-style question set. The input full-context must be aligned with phone-level or state-level.

Note

The implementation is adapted from Merlin, but no internal algorithms are changed. Unittests ensure this can get same results with Merlin for several typical settings.

Parameters

hts_label (hts.HTSLabelFile) – Input full-context label file
binary_dict (dict) – Dictionary used to extract binary features
numeric_dict (dict) – Dictionary used to extrract continuous features
subphone_features (dict) – Type of sub-phone features. According to the Merlin’s source code, None, full, state_only, frame_only, uniform_state, minimal_phoneme and coarse_coding are supported. However, None, full (for state alignment) and coarse_coding (phone alignment) are only tested in this library. Default is None.
add_frame_features (dict) – Whether add frame-level features or not. Default is False.
frame_shift (int) – Frame shift of alignment in 100ns units.

Returns

Numpy array representation of linguistic features.

Return type

numpy.ndarray

Examples

For state-level labels

>>> from nnmnkwii.frontend import merlin as fe
>>> from nnmnkwii.io import hts
>>> from nnmnkwii.util import example_label_file, example_question_file
>>> labels = hts.load(example_label_file(phone_level=False))
>>> binary_dict, numeric_dict = hts.load_question_set(example_question_file())
>>> features = fe.linguistic_features(labels, binary_dict, numeric_dict,
...     subphone_features="full", add_frame_features=True)
>>> features.shape
(615, 425)
>>> features = fe.linguistic_features(labels, binary_dict, numeric_dict,
...     subphone_features=None, add_frame_features=False)
>>> features.shape
(40, 416)

For phone-level labels

>>> from nnmnkwii.frontend import merlin as fe
>>> from nnmnkwii.io import hts
>>> from nnmnkwii.util import example_label_file, example_question_file
>>> labels = hts.load(example_label_file(phone_level=True))
>>> binary_dict, numeric_dict = hts.load_question_set(example_question_file())
>>> features = fe.linguistic_features(labels, binary_dict, numeric_dict,
...     subphone_features="coarse_coding", add_frame_features=True)
>>> features.shape
(615, 420)
>>> features = fe.linguistic_features(labels, binary_dict, numeric_dict,
...     subphone_features=None, add_frame_features=False)
>>> features.shape
(40, 416)