Skip to main content

An exploratory study on integrating radiomics with vision transformers for enhancing medical imaging classification accuracy.

Publication ,  Journal Article
Yang, Z; Zhang, R; Zhu, H; Zhang, H; Wang, J; Chen, M; Yin, F-F; Wang, C
Published in: Med Phys
January 2026

BACKGROUND: Medical image analysis has witnessed substantial advancements through recent deep learning (DL) algorithms development. Vision Transformers (ViTs) have emerged as a powerful alternative solution by leveraging self-attention to model both local and global interactions. Despite their promise, ViTs are data-intensive and lack inductive biases, limiting their utility in medical imaging. Conversely, radiomics offers domain-specific, interpretable descriptors of image heterogeneity but lacks scalability and integration with deep learning. This study proposes a unified Radiomics-Embedded Vision Transformer (RE-ViT) framework that combines handcrafted radiomic features and data-driven visual embeddings within a ViT architecture. PURPOSE: To develop and evaluate a RE-ViT framework that integrates radiomics and patch-wise ViT embeddings to improve feature representation for medical image classification across heterogeneous datasets. METHODS: Following the classic ViT design, the input image was first resampled into multiple image patches. For each image patch, handcrafted radiomic features, including intensity, texture, and spatial heterogeneity descriptors, were extracted. Simultaneously, standard patch embeddings were obtained via linear projection of pixel intensities. The two embeddings were averaged, normalized, and combined with positional encodings before being tokenized and processed by a ViT encoder. A learnable token aggregates patch-level information for final classification. The model was evaluated on three publicly available datasets, BUSI (lesion malignancy diagnosis through breast ultrasound), ChestXray2017 (lung pneumonitis diagnosis through chest x-ray), and Retinal OCT (retina disease diagnosis through retinal OCT), using 10-fold cross-validation. Performance metrics included accuracy, macro area under the ROC curve (AUC), sensitivity, and specificity. Ablation studies were implemented to assess the contribution of RE-ViT architectural components on these three clinical problems. Comparative analyses were also conducted against CNN (VGG-16, ResNet) and hybrid (TransMed) models. RESULTS: The proposed RE-ViT model demonstrated consistently robust classification performance across all three medical imaging datasets. In BUSI, RE-ViT achieved an accuracy of 0.848 ± 0.027, AUC of 0.950 ± 0.011, sensitivity of 0.796 ± 0.042, and specificity of 0.905 ± 0.020. In ChestXray2017, it yielded an accuracy of 0.950 ± 0.012, AUC of 0.989 ± 0.004, sensitivity of 0.953 ± 0.010, and specificity of 0.975 ± 0.005. In Retinal OCT, RE-ViT achieved an accuracy of 0.938 ± 0.001, AUC of 0.986 ± 0.001, sensitivity of 0.914 ± 0.023, and specificity of 0.969 ± 0.024. In the comparison studies, the RE-ViT matches or outperforms alternatives. Ablation revealed significant performance drops when removing either radiomics or projection-based embeddings. Attention map visualizations demonstrated imaging modality-specific utilization of radiomics and learned features, with improved localization of clinically relevant regions. CONCLUSIONS: The proposed radiomics-embedded vision transformer was developed for multiple image classification tasks. Current results underscore the potential of our approach to advance other transformer-based medical image classification tasks.

Duke Scholars

Published In

Med Phys

DOI

EISSN

2473-4209

Publication Date

January 2026

Volume

53

Issue

1

Start / End Page

e70246

Location

United States

Related Subject Headings

  • Radiomics
  • Nuclear Medicine & Medical Imaging
  • Image Processing, Computer-Assisted
  • Humans
  • Diagnostic Imaging
  • Deep Learning
  • 5105 Medical and biological physics
  • 4003 Biomedical engineering
  • 1112 Oncology and Carcinogenesis
  • 0903 Biomedical Engineering
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Yang, Z., Zhang, R., Zhu, H., Zhang, H., Wang, J., Chen, M., … Wang, C. (2026). An exploratory study on integrating radiomics with vision transformers for enhancing medical imaging classification accuracy. Med Phys, 53(1), e70246. https://doi.org/10.1002/mp.70246
Yang, Zhenyu, Rihui Zhang, Haiming Zhu, Haipeng Zhang, Jianliang Wang, Minbin Chen, Fang-Fang Yin, and Chunhao Wang. “An exploratory study on integrating radiomics with vision transformers for enhancing medical imaging classification accuracy.Med Phys 53, no. 1 (January 2026): e70246. https://doi.org/10.1002/mp.70246.
Yang Z, Zhang R, Zhu H, Zhang H, Wang J, Chen M, et al. An exploratory study on integrating radiomics with vision transformers for enhancing medical imaging classification accuracy. Med Phys. 2026 Jan;53(1):e70246.
Yang, Zhenyu, et al. “An exploratory study on integrating radiomics with vision transformers for enhancing medical imaging classification accuracy.Med Phys, vol. 53, no. 1, Jan. 2026, p. e70246. Pubmed, doi:10.1002/mp.70246.
Yang Z, Zhang R, Zhu H, Zhang H, Wang J, Chen M, Yin F-F, Wang C. An exploratory study on integrating radiomics with vision transformers for enhancing medical imaging classification accuracy. Med Phys. 2026 Jan;53(1):e70246.

Published In

Med Phys

DOI

EISSN

2473-4209

Publication Date

January 2026

Volume

53

Issue

1

Start / End Page

e70246

Location

United States

Related Subject Headings

  • Radiomics
  • Nuclear Medicine & Medical Imaging
  • Image Processing, Computer-Assisted
  • Humans
  • Diagnostic Imaging
  • Deep Learning
  • 5105 Medical and biological physics
  • 4003 Biomedical engineering
  • 1112 Oncology and Carcinogenesis
  • 0903 Biomedical Engineering