Skip to main content

SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis

Publication ,  Conference
Wu, Z; Liu, D; Liu, J; Wang, Y; Li, L; Jin, L; Bu, H; Zhang, P; Li, M
Published in: Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025
October 27, 2025

In natural language communication, emotions are often conveyed through non-verbal sounds (NVs), such as laughter, crying, cough and so on. However, most existing text-to-speech (TTS) corpora lack annotations for these non-verbal sounds, leading to a scarcity of systems capable of generating them. To address this gap, we introduce SMIIP-NV, a non-verbal speech synthesis corpus annotated with both emotions and non-verbal sounds, including laughter, crying, and cough. To the best of our knowledge, SMIIP-NV is the largest publicly available open-source expressive speech corpus that includes non-verbal speech and rich annotations. It comprises 33 hours of speech data, covering five distinct emotions and three types of non-verbal sounds, with detailed transcriptions and precise timestamps for each occurrence of non-verbal sounds. Additionally, the corpus provides annotations for speech segments that contain laughter or crying. To demonstrate the utility of this dataset, we establish a baseline for non-verbal speech synthesis by employing a lightweight large language model (LLM). The SMIIP-NV dataset and static audio demonstrations are publicly available at https://axunyii.github.io/SMIIP-NV. The interactive real-time demonstrations can be accessed at https://huggingface.co/spaces/xunyi/SMIIP-NV_Finetuned_CosyVoice2.

Duke Scholars

Published In

Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025

DOI

Publication Date

October 27, 2025

Start / End Page

12564 / 12570
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wu, Z., Liu, D., Liu, J., Wang, Y., Li, L., Jin, L., … Li, M. (2025). SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis. In Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025 (pp. 12564–12570). https://doi.org/10.1145/3746027.3758312
Wu, Z., D. Liu, J. Liu, Y. Wang, L. Li, L. Jin, H. Bu, P. Zhang, and M. Li. “SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis.” In Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with Mm 2025, 12564–70, 2025. https://doi.org/10.1145/3746027.3758312.
Wu Z, Liu D, Liu J, Wang Y, Li L, Jin L, et al. SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis. In: Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025. 2025. p. 12564–70.
Wu, Z., et al. “SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis.” Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with Mm 2025, 2025, pp. 12564–70. Scopus, doi:10.1145/3746027.3758312.
Wu Z, Liu D, Liu J, Wang Y, Li L, Jin L, Bu H, Zhang P, Li M. SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis. Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025. 2025. p. 12564–12570.

Published In

Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025

DOI

Publication Date

October 27, 2025

Start / End Page

12564 / 12570