Skip to main content

THE DKU AUDIO-VISUAL WAKE WORD SPOTTING SYSTEM FOR THE 2021 MISP CHALLENGE

Publication ,  Conference
Cheng, M; Wang, H; Wang, Y; Li, M
Published in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
January 1, 2022

This paper describes the system developed by the DKU team for the MISP Challenge 2021. We present a two-stage approach consisting of end-to-end neural networks for the audio-visual wake word spotting task. We first process audio and video data to give them a similar structure and then train two unimodal models with unified network architecture separately. Second, we propose a Hierarchical Modality Aggregation (HMA) module that fuses multi-scale audio-visual information from pre-trained unimodal models. Our system has a clear and concise framework consisting of end-to-end neural networks. With this framework and extensive data augmentation methods, our presented system achieves a false reject rate of 3.85% and a false alarm rate of 3.42% on far-field audio in the development set of the competition database, which ranks 2nd in the wake word spotting track of the MISP challenge.

Duke Scholars

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2022

Volume

2022-May

Start / End Page

9256 / 9260
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cheng, M., Wang, H., Wang, Y., & Li, M. (2022). THE DKU AUDIO-VISUAL WAKE WORD SPOTTING SYSTEM FOR THE 2021 MISP CHALLENGE. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2022-May, pp. 9256–9260). https://doi.org/10.1109/ICASSP43922.2022.9747216
Cheng, M., H. Wang, Y. Wang, and M. Li. “THE DKU AUDIO-VISUAL WAKE WORD SPOTTING SYSTEM FOR THE 2021 MISP CHALLENGE.” In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022-May:9256–60, 2022. https://doi.org/10.1109/ICASSP43922.2022.9747216.
Cheng M, Wang H, Wang Y, Li M. THE DKU AUDIO-VISUAL WAKE WORD SPOTTING SYSTEM FOR THE 2021 MISP CHALLENGE. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2022. p. 9256–60.
Cheng, M., et al. “THE DKU AUDIO-VISUAL WAKE WORD SPOTTING SYSTEM FOR THE 2021 MISP CHALLENGE.” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2022-May, 2022, pp. 9256–60. Scopus, doi:10.1109/ICASSP43922.2022.9747216.
Cheng M, Wang H, Wang Y, Li M. THE DKU AUDIO-VISUAL WAKE WORD SPOTTING SYSTEM FOR THE 2021 MISP CHALLENGE. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2022. p. 9256–9260.

Published In

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2022

Volume

2022-May

Start / End Page

9256 / 9260