Skip to main content

Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction

Publication ,  Conference
Liu, D; Lin, Y; Bu, H; Li, M
Published in: Proceedings of 2024 International Conference on Asian Language Processing Ialp 2024
January 1, 2024

Dysarthria is a motor speech disorder commonly associated with conditions such as cerebral palsy, Parkinson's disease, amyotrophic lateral sclerosis, and stroke. Individuals with dysarthria typically exhibit significant speech difficulties, including imprecise articulation, lack of fluency, slow speech rate, and decreased volume and clarity, which can hinder their ability to communicate effectively with others. We propose a two-stage Voice Conversion method to enhance the reconstruction of dysarthric speech. In the first stage, we develop a KNN-VC approach based on a same-gender-retrieval strategy to preliminarily repair the dysarthric speech. In this stage, we match the dysarthric speech only with normal speech of the same gender. In the second stage, we adapt so-vits-svc to restore the speaker's timbre and improve the sound quality of the speech repaired in the first stage. Both objective and subjective evaluations were conducted on the dataset of the Low Resource Dysarthria Wake-Up Word Spotting Challenge (LRDWWS Challenge) shows that the proposed approach can achieve some improvements in terms of speaker similarity, speech intelligibility and naturalness for unknown speakers, and these evaluations also show our method has a good Zero-shot performance. Our audio samples can be accessed online 1.

Duke Scholars

Published In

Proceedings of 2024 International Conference on Asian Language Processing Ialp 2024

DOI

Publication Date

January 1, 2024

Start / End Page

423 / 427
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Liu, D., Lin, Y., Bu, H., & Li, M. (2024). Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction. In Proceedings of 2024 International Conference on Asian Language Processing Ialp 2024 (pp. 423–427). https://doi.org/10.1109/IALP63756.2024.10661160
Liu, D., Y. Lin, H. Bu, and M. Li. “Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction.” In Proceedings of 2024 International Conference on Asian Language Processing Ialp 2024, 423–27, 2024. https://doi.org/10.1109/IALP63756.2024.10661160.
Liu D, Lin Y, Bu H, Li M. Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction. In: Proceedings of 2024 International Conference on Asian Language Processing Ialp 2024. 2024. p. 423–7.
Liu, D., et al. “Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction.” Proceedings of 2024 International Conference on Asian Language Processing Ialp 2024, 2024, pp. 423–27. Scopus, doi:10.1109/IALP63756.2024.10661160.
Liu D, Lin Y, Bu H, Li M. Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction. Proceedings of 2024 International Conference on Asian Language Processing Ialp 2024. 2024. p. 423–427.

Published In

Proceedings of 2024 International Conference on Asian Language Processing Ialp 2024

DOI

Publication Date

January 1, 2024

Start / End Page

423 / 427