Quantitative Evaluation of Artificial Intelligence-Based Organ Segmentation Across Multiple Anatomic Sites Using 8 Commercial Software Platforms.
PURPOSE: This study aims to evaluate organs-at-risk (OARs) segmentation variability across 8 commercial artificial intelligence (AI)-based segmentation software using independent multi-institutional data sets, and to provide recommendations for clinical practices using AI-segmentation. METHODS AND MATERIALS: A total of 160 planning computed tomography image sets from 4 anatomic sites: head and neck, thorax, abdomen, and pelvis were retrospectively pooled from 3 institutions. Contours for 31 OARs generated by the software were compared to clinical contours using multiple accuracy metrics, including: dice similarity coefficient (DSC), 95 percentile of Hausdorff distance, surface DSC, as well as relative added path length as an efficiency metric. A 2-factor analysis of variance was used to quantify variability in contouring accuracy across software platforms (intersoftware) and patients (interpatient). Pairwise comparisons were performed to categorize the software into different performance groups, and intersoftware variations were calculated as the average performance differences between the groups. RESULTS: Significant intersoftware and interpatient contouring accuracy variations (P < .05) were observed for most OARs. The largest intersoftware variations in DSC in each anatomic region were cervical esophagus (0.41), trachea (0.10), spinal cord (0.13), and prostate (0.17). Among the organs evaluated, 7 had mean DSC >0.9 (ie, heart, liver), 15 had DSC ranging from 0.7 to 0.89 (ie, parotid, esophagus). The remaining organs (ie, optic nerves, seminal vesicle) had DSC<0.7. Of the 31 organs, 16 (52%) had relative added path length less than 0.1. CONCLUSIONS: Our results reveal significant intersoftware and interpatient variability in the performance of AI-segmentation software. These findings highlight the need of thorough software commissioning, testing, and quality assurance across disease sites, patient-specific anatomies, and image acquisition protocols.
Duke Scholars
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Tomography, X-Ray Computed
- Software
- Retrospective Studies
- Radiotherapy Planning, Computer-Assisted
- Organs at Risk
- Male
- Image Processing, Computer-Assisted
- Humans
- Head and Neck Neoplasms
- Artificial Intelligence
Citation
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Tomography, X-Ray Computed
- Software
- Retrospective Studies
- Radiotherapy Planning, Computer-Assisted
- Organs at Risk
- Male
- Image Processing, Computer-Assisted
- Humans
- Head and Neck Neoplasms
- Artificial Intelligence