A Generalized Machine Learning Model for Identifying Congenital Heart Defects (CHDs) Using ICD Codes.
BACKGROUND: International Classification of Diseases (ICD) codes utilized for congenital heart defect (CHD) case identification in datasets have substantial false-positive (FP) rates. Incorporating machine learning (ML) algorithms following case selection by ICD codes may improve the accuracy of CHD identification, enhancing surveillance efforts. METHODS: Traditional ML methods were applied to four encounter-level datasets, 2010-2019, for 3334 patients with validated diagnoses and with at least one CHD ICD code identified. A 5-fold cross-validation approach was applied to the dataset to determine the set of overlapping important features best classifying CHD cases. Training and testing combinations were explored to determine the approach yielding the most accurate CHD classification. RESULTS: CHD ICD positive predictive values (PPVs) by site ranged from 53.2% to 84.0%. The ML algorithm achieved a PPV of 95% (1273/1340) for the four-site dataset with a false-negative (FN) rate of 33% (639/1912) by choosing an operating point prioritizing PPV from the PPV-FN rate curve. XGBoost reduced 2105 Clinical Classification Software (CCS) features to 137 that identified those with true-positive (TP) CHD and false-positive FP classification. CONCLUSION: Applying ML algorithms following case selection by CHD-related ICD codes improved the accuracy of identifying TP true-positive CHD cases.
Duke Scholars
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Male
- Machine Learning
- International Classification of Diseases
- Infant, Newborn
- Infant
- Humans
- Heart Defects, Congenital
- Female
- Algorithms
Citation
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Male
- Machine Learning
- International Classification of Diseases
- Infant, Newborn
- Infant
- Humans
- Heart Defects, Congenital
- Female
- Algorithms