Skip to main content

Lightweight Language Model for Speech Synthesis: Attempts and Analysis

Publication ,  Conference
Wu, Z; Liu, D; Li, M
Published in: 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
January 1, 2024

Large-scale autoregressive text-to-speech (TTS) models can generate speech that is nearly indistinguishable from human speech. However, training large language models (LLMs) is challenging due to memory and computational constraints. This paper describes our TTS method for the 2024 Conversational Voice Clone Challenge (CoVoC). Our approach modifies the LauraGPT model to synthesize mixed Chinese and English text by expanding the Chinese pinyin vocabulary and reducing the number of layers in the decoder-only Transformer architecture. Despite using minimal training data, the performance gap between our method and other constrained systems is relatively small in both subjective and some objective evaluations. This paper discusses our attempt to train lightweight LLMs for zero-shot TTS and analyzes the factors contributing to low performance. Our audio samples can be accessed online.

Duke Scholars

Published In

2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024

DOI

Publication Date

January 1, 2024

Start / End Page

501 / 505
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wu, Z., Liu, D., & Li, M. (2024). Lightweight Language Model for Speech Synthesis: Attempts and Analysis. In 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024 (pp. 501–505). https://doi.org/10.1109/ISCSLP63861.2024.10800708
Wu, Z., D. Liu, and M. Li. “Lightweight Language Model for Speech Synthesis: Attempts and Analysis.” In 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024, 501–5, 2024. https://doi.org/10.1109/ISCSLP63861.2024.10800708.
Wu Z, Liu D, Li M. Lightweight Language Model for Speech Synthesis: Attempts and Analysis. In: 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024. 2024. p. 501–5.
Wu, Z., et al. “Lightweight Language Model for Speech Synthesis: Attempts and Analysis.” 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024, 2024, pp. 501–05. Scopus, doi:10.1109/ISCSLP63861.2024.10800708.
Wu Z, Liu D, Li M. Lightweight Language Model for Speech Synthesis: Attempts and Analysis. 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024. 2024. p. 501–505.

Published In

2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024

DOI

Publication Date

January 1, 2024

Start / End Page

501 / 505