Skip to main content

End-To-End deep learning framework for speech paralinguistics detection based on perception aware spectrum

Publication ,  Conference
Cai, D; Ni, Z; Liu, W; Cai, W; Li, G; Li, M
Published in: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
January 1, 2017

In this paper, we propose an end-To-end deep learning framework to detect speech paralinguistics using perception aware spectrum as input. Existing studies show that speech under cold has distinct variations of energy distribution on low frequency components compared with the speech under 'healthy' condition. This motivates us to use perception aware spectrum as the input to an end-To-end learning framework with small scale dataset. In this work, we try both Constant Q Transform (CQT) spectrum and Gammatone spectrum in different end-Toend deep learning networks, where both spectrums are able to closely mimic the human speech perception and transform it into 2D images. Experimental results show the effectiveness of the proposed perception aware spectrum with end-To-end deep learning approach on Interspeech 2017 Computational Paralinguistics Cold sub-Challenge. The final fusion result of our proposed method is 8% better than that of the provided baseline in terms of UAR.

Duke Scholars

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2017

Volume

2017-August

Start / End Page

3452 / 3456
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, D., Ni, Z., Liu, W., Cai, W., Li, G., & Li, M. (2017). End-To-End deep learning framework for speech paralinguistics detection based on perception aware spectrum. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2017-August, pp. 3452–3456). https://doi.org/10.21437/Interspeech.2017-1445
Cai, D., Z. Ni, W. Liu, W. Cai, G. Li, and M. Li. “End-To-End deep learning framework for speech paralinguistics detection based on perception aware spectrum.” In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017-August:3452–56, 2017. https://doi.org/10.21437/Interspeech.2017-1445.
Cai D, Ni Z, Liu W, Cai W, Li G, Li M. End-To-End deep learning framework for speech paralinguistics detection based on perception aware spectrum. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2017. p. 3452–6.
Cai, D., et al. “End-To-End deep learning framework for speech paralinguistics detection based on perception aware spectrum.” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2017-August, 2017, pp. 3452–56. Scopus, doi:10.21437/Interspeech.2017-1445.
Cai D, Ni Z, Liu W, Cai W, Li G, Li M. End-To-End deep learning framework for speech paralinguistics detection based on perception aware spectrum. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2017. p. 3452–3456.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2017

Volume

2017-August

Start / End Page

3452 / 3456