Thyroid Nodules on Ultrasound in Children and Young Adults: Comparison of Diagnostic Performance of Radiologists' Impressions, ACR TI-RADS, and a Deep Learning Algorithm.
BACKGROUND. In current clinical practice, thyroid nodules in children are generally evaluated on the basis of radiologists' overall impressions of ultrasound images. OBJECTIVE. The purpose of this article is to compare the diagnostic performance of radiologists' overall impression, the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS), and a deep learning algorithm in differentiating benign and malignant thyroid nodules on ultrasound in children and young adults. METHODS. This retrospective study included 139 patients (median age 17.5 years; 119 female patients, 20 male patients) evaluated from January 1, 2004, to September 18, 2020, who were 21 years old and younger with a thyroid nodule on ultrasound with definitive pathologic results from fine-needle aspiration and/or surgical excision to serve as the reference standard. A single nodule per patient was selected, and one transverse and one longitudinal image each of the nodules were extracted for further evaluation. Three radiologists independently characterized nodules on the basis of their overall impression (benign vs malignant) and ACR TI-RADS. A previously developed deep learning algorithm determined for each nodule a likelihood of malignancy, which was used to derive a risk level. Sensitivities and specificities for malignancy were calculated. Agreement was assessed using Cohen kappa coefficients. RESULTS. For radiologists' overall impression, sensitivity ranged from 32.1% to 75.0% (mean, 58.3%; 95% CI, 49.2-67.3%), and specificity ranged from 63.8% to 93.9% (mean, 79.9%; 95% CI, 73.8-85.7%). For ACR TI-RADS, sensitivity ranged from 82.1% to 87.5% (mean, 85.1%; 95% CI, 77.3-92.1%), and specificity ranged from 47.0% to 54.2% (mean, 50.6%; 95% CI, 41.4-59.8%). The deep learning algorithm had a sensitivity of 87.5% (95% CI, 78.3-95.5%) and specificity of 36.1% (95% CI, 25.6-46.8%). Interobserver agreement among pairwise combinations of readers, expressed as kappa, for overall impression was 0.227-0.472 and for ACR TI-RADS was 0.597-0.643. CONCLUSION. Both ACR TI-RADS and the deep learning algorithm had higher sensitivity albeit lower specificity compared with overall impressions. The deep learning algorithm had similar sensitivity but lower specificity than ACR TI-RADS. Interobserver agreement was higher for ACR TI-RADS than for overall impressions. CLINICAL IMPACT. ACR TI-RADS and the deep learning algorithm may serve as potential alternative strategies for guiding decisions to perform fine-needle aspiration of thyroid nodules in children.
Duke Scholars
Altmetric Attention Stats
Dimensions Citation Stats
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Young Adult
- Ultrasonography
- Thyroid Nodule
- Retrospective Studies
- Radiologists
- Nuclear Medicine & Medical Imaging
- Male
- Humans
- Female
- Deep Learning
Citation
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Young Adult
- Ultrasonography
- Thyroid Nodule
- Retrospective Studies
- Radiologists
- Nuclear Medicine & Medical Imaging
- Male
- Humans
- Female
- Deep Learning