Scholars@Duke publication: Speaker verification based on the fusion of speech acoustics and inverted articulatory signals.

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals.

Publication , Journal Article

Li, M; Kim, J; Lammert, A; Ghosh, PK; Ramanarayanan, V; Narayanan, S

Published in: Computer speech & language

March 2016

We propose a practical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Computer speech & language

DOI

10.1016/j.csl.2015.05.003

EISSN

1095-8363

ISSN

0885-2308

Publication Date

March 2016

Volume

Start / End Page

196 / 211

Related Subject Headings

Speech-Language Pathology & Audiology
46 Information and computing sciences
40 Engineering
2004 Linguistics
1702 Cognitive Sciences
0801 Artificial Intelligence and Image Processing

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, M., Kim, J., Lammert, A., Ghosh, P. K., Ramanarayanan, V., & Narayanan, S. (2016). Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. Computer Speech & Language, 36, 196–211. https://doi.org/10.1016/j.csl.2015.05.003

Li, Ming, Jangwon Kim, Adam Lammert, Prasanta Kumar Ghosh, Vikram Ramanarayanan, and Shrikanth Narayanan. “Speaker verification based on the fusion of speech acoustics and inverted articulatory signals.” Computer Speech & Language 36 (March 2016): 196–211. https://doi.org/10.1016/j.csl.2015.05.003.

Li M, Kim J, Lammert A, Ghosh PK, Ramanarayanan V, Narayanan S. Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. Computer speech & language. 2016 Mar;36:196–211.

Li, Ming, et al. “Speaker verification based on the fusion of speech acoustics and inverted articulatory signals.” Computer Speech & Language, vol. 36, Mar. 2016, pp. 196–211. Epmc, doi:10.1016/j.csl.2015.05.003.

Published In

Computer speech & language

DOI

10.1016/j.csl.2015.05.003

EISSN

1095-8363

ISSN

0885-2308

Publication Date

March 2016

Volume

Start / End Page

196 / 211

Related Subject Headings

Speech-Language Pathology & Audiology
46 Information and computing sciences
40 Engineering
2004 Linguistics
1702 Cognitive Sciences
0801 Artificial Intelligence and Image Processing