Skip to main content

Ming Li

Professor of Electrical and Computer Engineering at Duke Kunshan University
DKU Faculty

Overview


Ming Li received his Ph.D. in Electrical Engineering from University of Southern California in 2013. He is currently a Professor of Electrical and Computer Engineering at Division of Natural and Applied Science and Principal Research Scientist at Digital Innovation Research Center at Duke Kunshan University. He is also an Adjunct Professor at School of Computer Science of Wuhan University. His research interests are in the areas of audio, speech and language processing as well as multimodal behavior signal analysis and interpretation. He has published more than 170 papers and served as the member of IEEE speech and language technical committee, APSIPA speech and language processing technical committee. He was the area chair of speaker and language recognition at Interspeech 2016, Interspeech 2018, Interspeech 2020, SLT2022. He is the technical program co-chair at Odyssey 2022 and ASRU 2023. He is an editorial member of Computer Speech and Language and APSIPA Transactions on Signal and Information Processing. Works co-authored with his colleagues have won first prize awards at Interspeech Computational Paralinguistic Challenge 2011, 2012 and 2019, ASRU 2019 MGB-5 ADI challenge, Interspeech 2020 and 2021 fearless steps challenge, VoxSRC 2021,2022 and 2023 challenge, ASVspoof21 challenge, ICASSP22 M2Met challenge, ICASSP23 MISP challenge and IJCAI ADD2023 challenge. He received the IBM faculty award in 2016, the ISCA Computer Speech and Language best journal paper award in 2018 and the youth achievement award of outstanding achievements of scientific research in higher education in 2020.

Current Appointments & Affiliations


Professor of Electrical and Computer Engineering at Duke Kunshan University · 2024 - Present DKU Faculty

Recent Publications


Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection

Journal Article Computer Speech and Language · November 1, 2025 Determining “who spoke what and when” remains challenging in real-world applications. In typical scenarios, Speaker Diarization (SD) is employed to address the problem of “who spoke when”, while Target Speaker Extraction (TSE) or Target Speaker Automatic S ... Full text Cite

Location-Guided Head Pose Estimation for Fisheye Image

Journal Article IEEE Transactions on Cognitive and Developmental Systems · January 1, 2025 Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye lens distortion in the peripheral region of the image leads to degraded performance of the existing head pose estimat ... Full text Cite

TMCSpeech: A Chinese TV and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model

Conference Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics · January 1, 2025 Recent research on text-guided speech synthesis has sparked considerable interest. This study explores the potential of leveraging publicly available internet video data for speech synthesis and character-based new voice generation. We introduce a multi-mo ... Full text Cite
View All Publications

Education, Training & Certifications


University of Southern California · 2013 Ph.D.