A Data-Centric Analysis of the Impact of Training Data Quality vs. Quantity on P300 Brain-Computer Interface Performance (Student Abstract)
The current standard for training brain-computer interface (BCI) machine learning models is user-specific. There is a high interest in developing generic models that are trained on data from other users to minimize BCI calibration time; however, this is limited by noisy, non-stationary brain signals and high inter-user variability. We investigate the trade-off between training data quality and quantity on P300 BCI performance in individuals with amyotrophic lateral sclerosis (ALS) with representative traditional machine learning (stepwise linear discriminant analysis, SWLDA) and deep learning (EEGNet) models. Results show that data quality and domain alignment are more critical than dataset size: user-specific models trained on significantly less data outperformed generic models; generic models trained on ALS data outperformed models trained on non-ALS data; block-averaging of features was mostly detrimental to EEGNet but beneficial to SWLDA; and accounting for inter-stimulus interval differences between ALS and non-ALS data had minimal effect. Our findings highlight the importance of individualized model tuning for reliable P300 BCIs.