Scholars@Duke publication: Lightweight Language Model for Speech Synthesis: Attempts and Analysis

Lightweight Language Model for Speech Synthesis: Attempts and Analysis

Publication , Conference

Wu, Z; Liu, D; Li, M

Published in: 2024 14th International Symposium on Chinese Spoken Language Processing Iscslp 2024

January 1, 2024

Large-scale autoregressive text-to-speech (TTS) models can generate speech that is nearly indistinguishable from human speech. However, training large language models (LLMs) is challenging due to memory and computational constraints. This paper describes our TTS method for the 2024 Conversational Voice Clone Challenge (CoVoC). Our approach modifies the LauraGPT model to synthesize mixed Chinese and English text by expanding the Chinese pinyin vocabulary and reducing the number of layers in the decoder-only Transformer architecture. Despite using minimal training data, the performance gap between our method and other constrained systems is relatively small in both subjective and some objective evaluations. This paper discusses our attempt to train lightweight LLMs for zero-shot TTS and analyzes the factors contributing to low performance. Our audio samples can be accessed online.

Duke Scholars

Author Ming Li DKU Faculty

Published In

2024 14th International Symposium on Chinese Spoken Language Processing Iscslp 2024

DOI

10.1109/ISCSLP63861.2024.10800708

Publication Date

January 1, 2024

Start / End Page

501 / 505

Citation

APA

Chicago

ICMJE

MLA

NLM

Wu, Z., Liu, D., & Li, M. (2024). Lightweight Language Model for Speech Synthesis: Attempts and Analysis. In 2024 14th International Symposium on Chinese Spoken Language Processing Iscslp 2024 (pp. 501–505). https://doi.org/10.1109/ISCSLP63861.2024.10800708

Wu, Z., D. Liu, and M. Li. “Lightweight Language Model for Speech Synthesis: Attempts and Analysis.” In 2024 14th International Symposium on Chinese Spoken Language Processing Iscslp 2024, 501–5, 2024. https://doi.org/10.1109/ISCSLP63861.2024.10800708.

Wu Z, Liu D, Li M. Lightweight Language Model for Speech Synthesis: Attempts and Analysis. In: 2024 14th International Symposium on Chinese Spoken Language Processing Iscslp 2024. 2024. p. 501–5.

Wu, Z., et al. “Lightweight Language Model for Speech Synthesis: Attempts and Analysis.” 2024 14th International Symposium on Chinese Spoken Language Processing Iscslp 2024, 2024, pp. 501–05. Scopus, doi:10.1109/ISCSLP63861.2024.10800708.

Wu Z, Liu D, Li M. Lightweight Language Model for Speech Synthesis: Attempts and Analysis. 2024 14th International Symposium on Chinese Spoken Language Processing Iscslp 2024. 2024. p. 501–505.

Published In

2024 14th International Symposium on Chinese Spoken Language Processing Iscslp 2024

DOI

10.1109/ISCSLP63861.2024.10800708

Publication Date

January 1, 2024

Start / End Page

501 / 505