Predicting atrial fibrillation and flutter using electronic health records.
Electronic Health Records (EHR) contain large amounts of useful information that could potentially be used for building models for predicting onset of diseases. In this study, we have investigated the use of free-text and coded data in Marshfield Clinic's EHR, individually and in combination for building machine learning based models to predict the first ever episode of atrial fibrillation and/or atrial flutter (AFF). We trained and evaluated our AFF models on the EHR data across different time intervals (1, 3, 5 and all years) prior to first documented onset of AFF. We applied several machine learning methods, including naïve bayes, support vector machines (SVM), logistic regression and random forests for building AFF prediction models and evaluated these using 10-fold cross-validation approach. On text-based datasets, the best model achieved an F-measure of 60.1%, when applied exclusively to coded data. The combination of textual and coded data achieved comparable performance. The study results attest to the relative merit of utilizing textual data to complement the use of coded data for disease onset prediction modeling.
Karnik, S; Tan, SL; Berg, B; Glurich, I; Zhang, J; Vidaillet, HJ; Page, CD; Chowdhary, R
Volume / Issue
Start / End Page
Pubmed Central ID
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)