Scholars@Duke publication: A four-alternative forced choice (4AFC) methodology for evaluating microcalcification detection in clinical full-field digital mammography (FFDM) and digital breast tomosynthesis (DBT) systems using an inkjet-printed anthropomorphic phantom.

A four-alternative forced choice (4AFC) methodology for evaluating microcalcification detection in clinical full-field digital mammography (FFDM) and digital breast tomosynthesis (DBT) systems using an inkjet-printed anthropomorphic phantom.

Publication , Journal Article

Ikejimba, LC; Salad, J; Graff, CG; Ghammraoui, B; Cheng, W-C; Lo, JY; Glick, SJ

Published in: Med Phys

September 2019

PURPOSE: The advent of three-dimensional breast imaging systems such as digital breast tomosynthesis (DBT) has great promise for improving the detection and diagnosis of breast cancer. With these new technologies comes an essential need for testing methods to assess the resultant image quality. Although randomized clinical trials are the gold standard for assessing image quality, phantom-based studies can provide a simpler and less burdensome approach. In this work, a complete framework is presented for task-based evaluation of microcalcification (MCs) detection performance for DBT imaging systems. METHODS: The framework consists of three parts. The first part is a realistic anthropomorphic physical breast phantom created through inkjet printing, with parchment paper and iodine-doped ink. The second is a method for inserting realistic MCs fabricated from calcium hydroxyapatite. The reproducibility and stability of the phantom materials were investigated through multiple samples of parchment and ink over 6 months. The final part is an analysis using a four-alternative forced choice (4AFC) reader study. To demonstrate the framework, a task-based 4AFC study was conducted using a clinical system to compare performance from DBT, synthetic mammography (SM), and full-field digital mammography (FFDM). Nine human observers read images containing MC clusters imaged with all three modalities and tried to correctly locate the MCs. The proportion correct (PC) was measured as the number of correctly detected clusters out of all trials. RESULTS: Overall, readers scored the highest with FFDM, (PC = 0.95 ± 0.03) then DBT (0.85 ± 0.04), and finally SM (0.44 ± 0.06). For the parchment and ink samples, the linear attenuation properties were very stable over 6 months. In addition, little difference was found between the various parchment and ink samples, indicating good reproducibility. CONCLUSIONS: This framework presents a promising methodology for evaluating diagnostic task performance of clinical breast DBT systems.