Skip to main content

Deep neural networks with batch speaker normalization for intoxicated speech detection

Publication ,  Conference
Wang, W; Wu, H; Li, M
Published in: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
November 1, 2019

Alcohol intoxication can affect people both physically and psychologically, and one's speech will also become different. However, detecting the intoxicated state from the speech is a challenging task. In this paper, we first implement the baseline model with ComParE feature and then explore the influence of the speaker information on the intoxication detection task. Besides, we apply a ResNet18 based model to this task. The model contains three parts: a representation learning subnetwork with Deep Residual Neural Network(ResNet) of 18-layer, a global average pooling(GAP) layer and a classifier of 2 fully connected layers. Since we cannot perform speaker z-normalization on the variant-length feature input, we employ the batch z-normalization to train the proposed model. It also achieves similar improvement like applying the speaker normalization to the baseline method. Experimental results show that speaker normalization on baseline model and batch z-normalization on ResNet18 based model provides 4.9% and 3.8% improvement respectively. The results show that speaker normalization can improve the performance of both the baseline model and the proposed model.

Duke Scholars

Published In

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

DOI

Publication Date

November 1, 2019

Start / End Page

1323 / 1327
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wang, W., Wu, H., & Li, M. (2019). Deep neural networks with batch speaker normalization for intoxicated speech detection. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 (pp. 1323–1327). https://doi.org/10.1109/APSIPAASC47483.2019.9023074
Wang, W., H. Wu, and M. Li. “Deep neural networks with batch speaker normalization for intoxicated speech detection.” In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, 1323–27, 2019. https://doi.org/10.1109/APSIPAASC47483.2019.9023074.
Wang W, Wu H, Li M. Deep neural networks with batch speaker normalization for intoxicated speech detection. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019. 2019. p. 1323–7.
Wang, W., et al. “Deep neural networks with batch speaker normalization for intoxicated speech detection.” 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, 2019, pp. 1323–27. Scopus, doi:10.1109/APSIPAASC47483.2019.9023074.
Wang W, Wu H, Li M. Deep neural networks with batch speaker normalization for intoxicated speech detection. 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019. 2019. p. 1323–1327.

Published In

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

DOI

Publication Date

November 1, 2019

Start / End Page

1323 / 1327