Improving Artificial Intelligence-based Microbial Keratitis Screening Tools Constrained by Limited Data Using Synthetic Generation of Slit-Lamp Photos.
OBJECTIVE: We developed a novel slit-lamp photography (SLP) generative adversarial network (GAN) model using limited data to supplement and improve the performance of an artificial intelligence (AI)-based microbial keratitis (MK) screening model. DESIGN: Cross-sectional study. SUBJECTS: Slit-lamp photographs of 67 healthy and 36 MK eyes were prospectively and retrospectively collected at a tertiary care ophthalmology clinic at a large academic institution. METHODS: We trained the GAN model StyleGAN2-ADA on healthy and MK SLPs to generate synthetic images. To assess synthetic image quality, we performed a visual Turing test. Three cornea fellows tested their ability to identify 20 images each of (1) real healthy, (2) real diseased, (3) synthetic healthy, and (4) synthetic diseased. We also used Kernel Inception Distance (KID) to quantitatively measure realism and variation of synthetic images. Using the same dataset used to train the GAN model, we trained 2 DenseNet121 AI models to grade SLP images as healthy or MK with (1) only real images and (2) real supplemented with GAN-generated images. MAIN OUTCOME MEASURES: Classification performance of MK screening models trained with only real images compared to a model trained with both limited real and supplemented synthetic GAN images. RESULTS: For the visual Turing test, the fellows on average rated synthetic images as good quality (83.3% ± 12.0% of images), and synthetic and real images were found to depict pertinent anatomy and pathology for accurate classification (96.3% ± 2.19% of images). These experts could distinguish between real and synthetic images (accuracy: 92.5% ± 9.01%). Analysis of KID score for synthetic images indicated realism and variation. The MK screening model trained on both limited real and supplemented synthetic data (area under the receiver-operator characteristic curve: 0.93, bootstrapping 95% CI: 0.77-1.0) outperformed the model trained with only real data (area under the receiver-operator characteristic curve: 0.76, 95% CI: 0.50-1.0), with an improvement of 0.17 (95% CI: 0-0.4; 2-tailed t test P = 0.076). CONCLUSIONS: Artificial intelligence-based MK classification may be improved by supplementation of limited real training data with synthetic data generated by GANs. FINANCIAL DISCLOSURES: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.