Skip to main content

Probing the Augmented Reality Scene Analysis Capabilities of Large Multimodal Models: Toward Reliable Real-Time Assessment Solutions

Publication ,  Journal Article
Duan, L; Rotondo, E; Xiu, Y; Eom, S; Chen, R; Li, C; Hu, Y; Gorlatova, M
Published in: IEEE Internet Computing
January 1, 2025

Augmented Reality (AR) is transforming everyday experiences across domains like education, entertainment, and healthcare. As AR technologies become increasingly widespread, human-aligned and scalable AR quality evaluation is critical for optimizing immersive user experiences. This paper investigates the potential of Large Multimodal Models (LMMs) for automating AR quality assessment. We curate DiverseAR+, a new dataset of 1,405 scenes collected from diverse sources and environments, and use it to evaluate four commercial LMMs. Our results show that LMMs can perceive, describe, and judge AR content with promising accuracy. To deliver real-time, scalable, and robust AR quality evaluation, we propose a hybrid cloud-edge architecture that combines LMMs with traditional machine learning models, enabling scalable and real-time evaluation under diverse network conditions. We argue that task-tailored AR-LMM systems can make AR experience evaluation more efficient, adaptive, and user-centered.

Duke Scholars

Published In

IEEE Internet Computing

DOI

EISSN

1941-0131

ISSN

1089-7801

Publication Date

January 1, 2025

Related Subject Headings

  • Networking & Telecommunications
  • 4606 Distributed computing and systems software
  • 4009 Electronics, sensors and digital hardware
  • 1005 Communications Technologies
  • 0906 Electrical and Electronic Engineering
  • 0805 Distributed Computing
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Duan, L., Rotondo, E., Xiu, Y., Eom, S., Chen, R., Li, C., … Gorlatova, M. (2025). Probing the Augmented Reality Scene Analysis Capabilities of Large Multimodal Models: Toward Reliable Real-Time Assessment Solutions. IEEE Internet Computing. https://doi.org/10.1109/MIC.2025.3622505
Duan, L., E. Rotondo, Y. Xiu, S. Eom, R. Chen, C. Li, Y. Hu, and M. Gorlatova. “Probing the Augmented Reality Scene Analysis Capabilities of Large Multimodal Models: Toward Reliable Real-Time Assessment Solutions.” IEEE Internet Computing, January 1, 2025. https://doi.org/10.1109/MIC.2025.3622505.
Duan L, Rotondo E, Xiu Y, Eom S, Chen R, Li C, et al. Probing the Augmented Reality Scene Analysis Capabilities of Large Multimodal Models: Toward Reliable Real-Time Assessment Solutions. IEEE Internet Computing. 2025 Jan 1;
Duan, L., et al. “Probing the Augmented Reality Scene Analysis Capabilities of Large Multimodal Models: Toward Reliable Real-Time Assessment Solutions.” IEEE Internet Computing, Jan. 2025. Scopus, doi:10.1109/MIC.2025.3622505.
Duan L, Rotondo E, Xiu Y, Eom S, Chen R, Li C, Hu Y, Gorlatova M. Probing the Augmented Reality Scene Analysis Capabilities of Large Multimodal Models: Toward Reliable Real-Time Assessment Solutions. IEEE Internet Computing. 2025 Jan 1;

Published In

IEEE Internet Computing

DOI

EISSN

1941-0131

ISSN

1089-7801

Publication Date

January 1, 2025

Related Subject Headings

  • Networking & Telecommunications
  • 4606 Distributed computing and systems software
  • 4009 Electronics, sensors and digital hardware
  • 1005 Communications Technologies
  • 0906 Electrical and Electronic Engineering
  • 0805 Distributed Computing