Skip to main content

CXR-TFT: Multi-modal Temporal Fusion Transformer for Predicting Chest X-Ray Trajectories

Publication ,  Conference
Arora, M; Ali, A; Wu, K; Davis, C; Shimazui, T; Alwakeel, M; Moas, V; Yang, P; Esper, A; Kamaleswaran, R
Published in: Lecture Notes in Computer Science
January 1, 2026

In intensive care units (ICUs), patients with complex clinical conditions require vigilant monitoring and prompt interventions. Chest X-rays (CXRs) are a vital diagnostic tool, providing insights into clinical trajectories, but their irregular acquisition limits their utility. Existing tools for CXR interpretation are constrained by cross-sectional analysis, failing to capture temporal dynamics. To address this, we introduce CXR-TFT, a novel multi-modal framework that integrates temporally sparse CXR imaging and radiology reports with high-frequency clinical data–such as vital signs, laboratory values, and respiratory flow sheets–to predict the trajectory of CXR findings in critically ill patients. CXR-TFT leverages latent embeddings from a vision encoder that are temporally aligned with hourly clinical data through interpolation. A transformer is trained to predict CXR embeddings at each hour, conditioned on previous CXR embeddings and clinical measurements. In a retrospective study of 20,000 ICU patients, CXR-TFT demonstrated 95% accuracy in predicting abnormal CXR findings 12 h before they became radiographically evident, indicating that clinical data contains valuable respiratory state progression information. By providing distinctive temporal resolution in prognostic CXR analysis, CXR-TFT offers actionable predictions with the potential to improve the management of time-sensitive critical conditions, where early intervention is crucial but timely diagnosis is challenging.

Duke Scholars

Published In

Lecture Notes in Computer Science

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2026

Volume

15974 LNCS

Start / End Page

158 / 166

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Arora, M., Ali, A., Wu, K., Davis, C., Shimazui, T., Alwakeel, M., … Kamaleswaran, R. (2026). CXR-TFT: Multi-modal Temporal Fusion Transformer for Predicting Chest X-Ray Trajectories (Accepted). In Lecture Notes in Computer Science (Vol. 15974 LNCS, pp. 158–166). https://doi.org/10.1007/978-3-032-05182-0_16
Arora, M., A. Ali, K. Wu, C. Davis, T. Shimazui, M. Alwakeel, V. Moas, P. Yang, A. Esper, and R. Kamaleswaran. “CXR-TFT: Multi-modal Temporal Fusion Transformer for Predicting Chest X-Ray Trajectories (Accepted).” In Lecture Notes in Computer Science, 15974 LNCS:158–66, 2026. https://doi.org/10.1007/978-3-032-05182-0_16.
Arora M, Ali A, Wu K, Davis C, Shimazui T, Alwakeel M, et al. CXR-TFT: Multi-modal Temporal Fusion Transformer for Predicting Chest X-Ray Trajectories (Accepted). In: Lecture Notes in Computer Science. 2026. p. 158–66.
Arora, M., et al. “CXR-TFT: Multi-modal Temporal Fusion Transformer for Predicting Chest X-Ray Trajectories (Accepted).” Lecture Notes in Computer Science, vol. 15974 LNCS, 2026, pp. 158–66. Scopus, doi:10.1007/978-3-032-05182-0_16.
Arora M, Ali A, Wu K, Davis C, Shimazui T, Alwakeel M, Moas V, Yang P, Esper A, Kamaleswaran R. CXR-TFT: Multi-modal Temporal Fusion Transformer for Predicting Chest X-Ray Trajectories (Accepted). Lecture Notes in Computer Science. 2026. p. 158–166.

Published In

Lecture Notes in Computer Science

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2026

Volume

15974 LNCS

Start / End Page

158 / 166

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences