Skip to main content

Optimizing Deep Neural Networks for EEG-Based Speech Recognition: A Multimodal Approach to Assistive Communication.

Publication ,  Journal Article
Das, A; Soni, P; Zhao, H; Huang, M-C; Xu, W
Published in: IEEE journal of biomedical and health informatics
December 2025

Speech recognition for individuals with impairments remains a significant challenge due to atypical speech patterns thatconfound traditional acoustic-only models. This study introduces NeuroSpeech, a novel multimodal framework that integrateselectroencephalography (EEG) with acoustic features to improve recognition accuracy, robustness, and efficiency. A large-scale random search identified optimal EEG encoder configurations and feature extraction parameters, with window size and overlap ($p < 0.001$) emerging as critical factors. Explainable AI (XAI) methods, specifically SHAP, provided insights into model decision-making, supporting interpretability and clinical translation. Evaluations were conducted on two publicly available datasets: Spanish commands and vowels (UNLP-CONICET) and English phonemes and words (KaraOne). Under clean conditions, NeuroSpeech achieved near-perfect accuracy ($F1 = 0.986$ on Spanish; 0.837 on English), while in noisy conditions (SNR = 0.5) it maintained strong performance ($F1 = 0.92$ and 0.70), demonstrating EEG's role as a noise-robust complementary signal. In contrast, Whisper, a state-of-the-art ASR model, showed severe degradation under noise (e.g., $F1$ dropping from 0.81 to 0.46). Finally, complexity analysis showed that NeuroSpeech is lightweight (1-30M parameters) with inference latency of 10-18ms/sample (RTF $< 1$ on CPU and GPU), enabling near-real-time deployment. These results demonstrate NeuroSpeech's significant potential to leverage neural information to augment speech that is compromised, offering a promising advancement for assistive technologies and improved communication for individuals with speech disorders.

Duke Scholars

Published In

IEEE journal of biomedical and health informatics

DOI

EISSN

2168-2208

ISSN

2168-2194

Publication Date

December 2025

Volume

29

Issue

12

Start / End Page

8735 / 8742

Related Subject Headings

  • Speech Recognition Software
  • Signal Processing, Computer-Assisted
  • Neural Networks, Computer
  • Male
  • Humans
  • Electroencephalography
  • Deep Learning
  • Communication Devices for People with Disabilities
  • Adult
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Das, A., Soni, P., Zhao, H., Huang, M.-C., & Xu, W. (2025). Optimizing Deep Neural Networks for EEG-Based Speech Recognition: A Multimodal Approach to Assistive Communication. IEEE Journal of Biomedical and Health Informatics, 29(12), 8735–8742. https://doi.org/10.1109/jbhi.2025.3618998
Das, Anarghya, Puru Soni, Hubin Zhao, Ming-Chun Huang, and Wenyao Xu. “Optimizing Deep Neural Networks for EEG-Based Speech Recognition: A Multimodal Approach to Assistive Communication.IEEE Journal of Biomedical and Health Informatics 29, no. 12 (December 2025): 8735–42. https://doi.org/10.1109/jbhi.2025.3618998.
Das A, Soni P, Zhao H, Huang M-C, Xu W. Optimizing Deep Neural Networks for EEG-Based Speech Recognition: A Multimodal Approach to Assistive Communication. IEEE journal of biomedical and health informatics. 2025 Dec;29(12):8735–42.
Das, Anarghya, et al. “Optimizing Deep Neural Networks for EEG-Based Speech Recognition: A Multimodal Approach to Assistive Communication.IEEE Journal of Biomedical and Health Informatics, vol. 29, no. 12, Dec. 2025, pp. 8735–42. Epmc, doi:10.1109/jbhi.2025.3618998.
Das A, Soni P, Zhao H, Huang M-C, Xu W. Optimizing Deep Neural Networks for EEG-Based Speech Recognition: A Multimodal Approach to Assistive Communication. IEEE journal of biomedical and health informatics. 2025 Dec;29(12):8735–8742.

Published In

IEEE journal of biomedical and health informatics

DOI

EISSN

2168-2208

ISSN

2168-2194

Publication Date

December 2025

Volume

29

Issue

12

Start / End Page

8735 / 8742

Related Subject Headings

  • Speech Recognition Software
  • Signal Processing, Computer-Assisted
  • Neural Networks, Computer
  • Male
  • Humans
  • Electroencephalography
  • Deep Learning
  • Communication Devices for People with Disabilities
  • Adult