Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction
Publication
, Conference
Cheng, M; Wang, W; Zhang, Y; Qin, X; Li, M
Published in: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
January 1, 2023
Target-speaker voice activity detection is currently a promising approach for speaker diarization in complex acoustic environments. This paper presents a novel Sequence-to-Sequence Target-Speaker Voice Activity Detection (Seq2Seq-TSVAD) method that can efficiently address the joint modeling of large-scale speakers and predict high-resolution voice activities. Experimental results show that larger speaker capacity and higher output resolution can significantly reduce the diarization error rate (DER), which achieves the new state-of-the-art performance of 4.55% on the VoxConverse test set and 10.77% on Track 1 of the DIHARD-III evaluation set under the widely-used evaluation metrics.
Duke Scholars
Published In
ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
DOI
ISSN
1520-6149
Publication Date
January 1, 2023
Volume
2023-June
Citation
APA
Chicago
ICMJE
MLA
NLM
Cheng, M., Wang, W., Zhang, Y., Qin, X., & Li, M. (2023). Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction. In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 2023-June). https://doi.org/10.1109/ICASSP49357.2023.10094752
Cheng, M., W. Wang, Y. Zhang, X. Qin, and M. Li. “Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction.” In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Vol. 2023-June, 2023. https://doi.org/10.1109/ICASSP49357.2023.10094752.
Cheng M, Wang W, Zhang Y, Qin X, Li M. Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction. In: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2023.
Cheng, M., et al. “Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction.” ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 2023-June, 2023. Scopus, doi:10.1109/ICASSP49357.2023.10094752.
Cheng M, Wang W, Zhang Y, Qin X, Li M. Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2023.
Published In
ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
DOI
ISSN
1520-6149
Publication Date
January 1, 2023
Volume
2023-June