Scholars@Duke publication: Speaker verification based on fusion of acoustic and articulatory information

Speaker verification based on fusion of acoustic and articulatory information

Publication , Conference

Li, M; Kim, J; Ghosh, P; Ramanarayanan, V; Narayanan, S

Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

January 1, 2013

We propose a practical, feature-level fusion approach for combining acoustic and articulatory information in speaker verification task. We find that concatenating articulation features obtained from the measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the overall speaker verification performance. However, since access to the measured articulatory data is impractical for real world speaker verification applications, we also experiment with estimated articulatory features obtained using acoustic-to-articulatory inversion technique. Specifically, we show that augmenting MFCCs with articulatory features obtained from subject-independent acoustic-to-articulatory inversion technique also significantly enhances the speaker verification performance. This performance boost could be due to the information about inter-speaker variation present in the estimated articulatory features, especially at the mean and variance level. Experimental results on the Wisconsin X-Ray Microbeam database show that the proposed acoustic-estimatedarticulatory fusion approach significantly outperforms the traditional acoustic-only baseline, providing up to 10% relative reduction in Equal Error Rate (EER). We further show that we can achieve an additional 5% relative reduction in EER after score-level fusion. Copyright © 2013 ISCA.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2013

Start / End Page

1614 / 1618

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, M., Kim, J., Ghosh, P., Ramanarayanan, V., & Narayanan, S. (2013). Speaker verification based on fusion of acoustic and articulatory information. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (pp. 1614–1618).

Li, M., J. Kim, P. Ghosh, V. Ramanarayanan, and S. Narayanan. “Speaker verification based on fusion of acoustic and articulatory information.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 1614–18, 2013.

Li M, Kim J, Ghosh P, Ramanarayanan V, Narayanan S. Speaker verification based on fusion of acoustic and articulatory information. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2013. p. 1614–8.

Li, M., et al. “Speaker verification based on fusion of acoustic and articulatory information.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2013, pp. 1614–18.

Li M, Kim J, Ghosh P, Ramanarayanan V, Narayanan S. Speaker verification based on fusion of acoustic and articulatory information. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2013. p. 1614–1618.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2013

Start / End Page

1614 / 1618