Optimizing Deep Neural Networks for EEG-Based Speech Recognition: A Multimodal Approach to Assistive Communication.
Speech recognition for individuals with impairments remains a significant challenge due to atypical speech patterns thatconfound traditional acoustic-only models. This study introduces NeuroSpeech, a novel multimodal framework that integrateselectroencephalography (EEG) with acoustic features to improve recognition accuracy, robustness, and efficiency. A large-scale random search identified optimal EEG encoder configurations and feature extraction parameters, with window size and overlap ($p < 0.001$) emerging as critical factors. Explainable AI (XAI) methods, specifically SHAP, provided insights into model decision-making, supporting interpretability and clinical translation. Evaluations were conducted on two publicly available datasets: Spanish commands and vowels (UNLP-CONICET) and English phonemes and words (KaraOne). Under clean conditions, NeuroSpeech achieved near-perfect accuracy ($F1 = 0.986$ on Spanish; 0.837 on English), while in noisy conditions (SNR = 0.5) it maintained strong performance ($F1 = 0.92$ and 0.70), demonstrating EEG's role as a noise-robust complementary signal. In contrast, Whisper, a state-of-the-art ASR model, showed severe degradation under noise (e.g., $F1$ dropping from 0.81 to 0.46). Finally, complexity analysis showed that NeuroSpeech is lightweight (1-30M parameters) with inference latency of 10-18ms/sample (RTF $< 1$ on CPU and GPU), enabling near-real-time deployment. These results demonstrate NeuroSpeech's significant potential to leverage neural information to augment speech that is compromised, offering a promising advancement for assistive technologies and improved communication for individuals with speech disorders.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Speech Recognition Software
- Signal Processing, Computer-Assisted
- Neural Networks, Computer
- Male
- Humans
- Electroencephalography
- Deep Learning
- Communication Devices for People with Disabilities
- Adult
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Speech Recognition Software
- Signal Processing, Computer-Assisted
- Neural Networks, Computer
- Male
- Humans
- Electroencephalography
- Deep Learning
- Communication Devices for People with Disabilities
- Adult