Skip to main content

ScoEHR: Generating Synthetic Electronic Health Records using Continuous-time Diffusion Models

Publication ,  Conference
Naseer, AA; Walker, B; Landon, C; Ambrosy, A; Fudim, M; Wysham, N; Toro, B; Swaminathan, S; Lyons, T
Published in: Proceedings of Machine Learning Research
January 1, 2023

Global access to statistically and clinically representative patient health data holds potential for advancing disease research, enhancing patient care, and accelerating drug development. However, acquisition of health data such as electronic health records (EHRs) comes with challenges characterised by high costs, time constraints, and concerns related to patient privacy. An approach to tackling these challenges is by using synthetic data. In this paper we introduce ScoEHR, a novel deep learning method for generating synthetic EHRs, which combines an autoencoder with a continuous-time diffusion model. ScoEHR is shown to outperform three baseline synthetic EHR generation frameworks (medGAN, medWGAN, and medBGAN) on two publicly available datasets, MIMIC-III and the Yale New Haven Health System Emergency Department dataset, based on four widely accepted metrics of data utility. Additionally, a blind clinician evaluation was carried out to assess the qualitative realism of the synthetic data generated by ScoEHR. In this evaluation, a patient’s data was labeled as ‘unrealistic’ if at least one clinician found it to be unrealistic. This evaluation showed that existing real EHR data and ScoEHR generated synthetic data were scored as equally realistic. Our code is available at https://github.com/aanaseer/ ScoEHR.

Duke Scholars

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2023

Volume

219

Start / End Page

489 / 508
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Naseer, A. A., Walker, B., Landon, C., Ambrosy, A., Fudim, M., Wysham, N., … Lyons, T. (2023). ScoEHR: Generating Synthetic Electronic Health Records using Continuous-time Diffusion Models. In Proceedings of Machine Learning Research (Vol. 219, pp. 489–508).
Naseer, A. A., B. Walker, C. Landon, A. Ambrosy, M. Fudim, N. Wysham, B. Toro, S. Swaminathan, and T. Lyons. “ScoEHR: Generating Synthetic Electronic Health Records using Continuous-time Diffusion Models.” In Proceedings of Machine Learning Research, 219:489–508, 2023.
Naseer AA, Walker B, Landon C, Ambrosy A, Fudim M, Wysham N, et al. ScoEHR: Generating Synthetic Electronic Health Records using Continuous-time Diffusion Models. In: Proceedings of Machine Learning Research. 2023. p. 489–508.
Naseer, A. A., et al. “ScoEHR: Generating Synthetic Electronic Health Records using Continuous-time Diffusion Models.” Proceedings of Machine Learning Research, vol. 219, 2023, pp. 489–508.
Naseer AA, Walker B, Landon C, Ambrosy A, Fudim M, Wysham N, Toro B, Swaminathan S, Lyons T. ScoEHR: Generating Synthetic Electronic Health Records using Continuous-time Diffusion Models. Proceedings of Machine Learning Research. 2023. p. 489–508.

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2023

Volume

219

Start / End Page

489 / 508