Scholars@Duke publication: Deep neural networks with batch speaker normalization for intoxicated speech detection

Deep neural networks with batch speaker normalization for intoxicated speech detection

Publication , Conference

Wang, W; Wu, H; Li, M

Published in: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

November 1, 2019

Published version (DOI)

Alcohol intoxication can affect people both physically and psychologically, and one's speech will also become different. However, detecting the intoxicated state from the speech is a challenging task. In this paper, we first implement the baseline model with ComParE feature and then explore the influence of the speaker information on the intoxication detection task. Besides, we apply a ResNet18 based model to this task. The model contains three parts: a representation learning subnetwork with Deep Residual Neural Network(ResNet) of 18-layer, a global average pooling(GAP) layer and a classifier of 2 fully connected layers. Since we cannot perform speaker z-normalization on the variant-length feature input, we employ the batch z-normalization to train the proposed model. It also achieves similar improvement like applying the speaker normalization to the baseline method. Experimental results show that speaker normalization on baseline model and batch z-normalization on ResNet18 based model provides 4.9% and 3.8% improvement respectively. The results show that speaker normalization can improve the performance of both the baseline model and the proposed model.

Duke Scholars

Author Ming Li DKU Faculty

Published In

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

DOI

10.1109/APSIPAASC47483.2019.9023074

Publication Date

November 1, 2019

Start / End Page

1323 / 1327

Citation

APA

Chicago

ICMJE

MLA

NLM

Wang, W., Wu, H., & Li, M. (2019). Deep neural networks with batch speaker normalization for intoxicated speech detection. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 (pp. 1323–1327). https://doi.org/10.1109/APSIPAASC47483.2019.9023074

Wang, W., H. Wu, and M. Li. “Deep neural networks with batch speaker normalization for intoxicated speech detection.” In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, 1323–27, 2019. https://doi.org/10.1109/APSIPAASC47483.2019.9023074.

Wang W, Wu H, Li M. Deep neural networks with batch speaker normalization for intoxicated speech detection. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019. 2019. p. 1323–7.

Wang, W., et al. “Deep neural networks with batch speaker normalization for intoxicated speech detection.” 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, 2019, pp. 1323–27. Scopus, doi:10.1109/APSIPAASC47483.2019.9023074.

Wang W, Wu H, Li M. Deep neural networks with batch speaker normalization for intoxicated speech detection. 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019. 2019. p. 1323–1327.

Published In

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

DOI

10.1109/APSIPAASC47483.2019.9023074

Publication Date

November 1, 2019

Start / End Page

1323 / 1327