Abstract 4368616: Core lab versus computer: Pediatric echocardiogram measurement agreement between expert human and AI readers
Edwards, L; Sharma, S; Armenian, S; Bhat, A; Blythe, N; Border, W; Boyle, P; Leger, K; Leisenring, W; Meacham, L; Nathan, P; Narasimhan, S ...
Published in: Circulation
Deep learning algorithms for automated echocardiographic measurements have demonstrated strong performance in adult populations; however, their utility in pediatric echocardiography remains unclear. We evaluated the agreement between an FDA-approved software for automated adult echocardiogram measurements by Us2.ai and a pediatric core lab reader in assessing left ventricular (LV) size and function.
We analyzed a retrospective dataset of pediatric echocardiogram DICOM files from 5 pediatric centers and corresponding core lab measurements collected from childhood cancer survivors under 21 years of age. The automated software processed the DICOM files, and agreement with core lab measurements for 17 2D and Doppler measurements was assessed using mean difference and intraclass correlation coefficient (ICC; two-way random effects, absolute agreement, single measures).
A total of 652 echocardiograms from 153 childhood cancer survivors were included. Median age at time of study was 13.4 (Q1 - Q3: 9.5 - 16.3) years, and 16% of studies showed depressed LV systolic function by core lab measurements (LV shortening fraction ≤28% or ejection fraction [EF] ≤50%). Table 1 summarizes the mean difference and ICC between the automated and core lab reader. Agreement was at least moderate (ICC > 0.5) across all variables. On average, the automated software underestimated biplane EF by 5 percentage points compared to the core lab reader with greater mean differences observed at higher EFs (-1 for core lab EF ≤ 50% and -5 for EF >50%; Figure 1).
Independent validation of an automated echocardiographic measurement software in a pediatric dataset demonstrated at least moderate agreement of all measurements with gold-standard core lab measurements. The software exhibited a bias toward lower ejection fraction values; however, ICC for ejection fraction was comparable to previously reported interobserver variability among human pediatric readers.