Improving the Efficiency of Clinical Trial Recruitment Using an Ensemble Machine Learning to Assist With Eligibility Screening
Objective: Efficiently identifying eligible patients is a crucial first step for a successful clinical trial. The objective of this study was to test whether an approach using electronic health record (EHR) data and an ensemble machine learning algorithm incorporating billing codes and data from clinical notes processed by natural language processing (NLP) can improve the efficiency of eligibility screening. Methods: We studied patients screened for a clinical trial of rheumatoid arthritis (RA) with one or more International Classification of Diseases (ICD) code for RA and age greater than 35 years, from a tertiary care center and a community hospital. The following three groups of EHR features were considered for the algorithm: 1) structured features, 2) the counts of NLP concepts from notes, 3) health care utilization. All features were linked to dates. We applied random forest and logistic regression with least absolute shrinkage and selection operator penalty against the following two standard approaches: 1) one or more RA ICD code and no ICD codes related to exclusion criteria (Screen
Duke Scholars
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- 3202 Clinical sciences
Citation
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- 3202 Clinical sciences