Skip to main content

Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction

Publication ,  Conference
Cheng, M; Wang, W; Zhang, Y; Qin, X; Li, M
Published in: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
January 1, 2023

Target-speaker voice activity detection is currently a promising approach for speaker diarization in complex acoustic environments. This paper presents a novel Sequence-to-Sequence Target-Speaker Voice Activity Detection (Seq2Seq-TSVAD) method that can efficiently address the joint modeling of large-scale speakers and predict high-resolution voice activities. Experimental results show that larger speaker capacity and higher output resolution can significantly reduce the diarization error rate (DER), which achieves the new state-of-the-art performance of 4.55% on the VoxConverse test set and 10.77% on Track 1 of the DIHARD-III evaluation set under the widely-used evaluation metrics.

Duke Scholars

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2023

Volume

2023-June
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cheng, M., Wang, W., Zhang, Y., Qin, X., & Li, M. (2023). Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction. In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 2023-June). https://doi.org/10.1109/ICASSP49357.2023.10094752
Cheng, M., W. Wang, Y. Zhang, X. Qin, and M. Li. “Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction.” In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Vol. 2023-June, 2023. https://doi.org/10.1109/ICASSP49357.2023.10094752.
Cheng M, Wang W, Zhang Y, Qin X, Li M. Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction. In: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2023.
Cheng, M., et al. “Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction.” ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 2023-June, 2023. Scopus, doi:10.1109/ICASSP49357.2023.10094752.
Cheng M, Wang W, Zhang Y, Qin X, Li M. Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2023.

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

ISSN

1520-6149

Publication Date

January 1, 2023

Volume

2023-June