Intoxicated speech detection by fusion of speaker normalized hierarchical features and GMM supervectors
Speaker state recognition is a challenging problem due to speaker and context variability. Intoxication detection is an important area of paralinguistic speech research with potential real-world applications. In this work, we build upon a base set of various static acoustic features by proposing the combination of several different methods for this learning task. The methods include extracting hierarchical acoustic features, performing iterative speaker normalization, and using a set of GMM supervectors. We obtain an optimal unweighted recall for intoxication recognition using score-level fusion of these subsystems. Unweighted average recall performance is 70.54% on the test set, an improvement of 4.64% absolute (7.04% relative) over the baseline model accuracy of 65.9%. Copyright © 2011 ISCA.