Skip to main content

Probing Multimodal LLMs as World Models for Driving

Publication ,  Journal Article
Sreeram, S; Wang, TH; Maalouf, A; Rosman, G; Karaman, S; Rus, D
Published in: IEEE Robotics and Automation Letters
January 1, 2025

We provide a sober look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving, challenging common assumptions about their ability to interpret dynamic driving scenarios. Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored. Our experimental study assesses various MLLMs as world models using in-car camera perspectives and reveals that while these models excel at interpreting individual images, they struggle to synthesize coherent narratives across frames, leading to considerable inaccuracies in understanding (i) ego vehicle dynamics, (ii) interactions with other road actors, (iii) trajectory planning, and (iv) open-set scene reasoning. We introduce the Eval-LLM-Drive dataset and DriveSim simulator to enhance our evaluation, highlighting gaps in current MLLM capabilities and the need for improved models in dynamic real-world environments.

Duke Scholars

Published In

IEEE Robotics and Automation Letters

DOI

EISSN

2377-3766

Publication Date

January 1, 2025

Volume

10

Issue

11

Start / End Page

11403 / 11410

Related Subject Headings

  • 4602 Artificial intelligence
  • 4007 Control engineering, mechatronics and robotics
  • 0913 Mechanical Engineering
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Sreeram, S., Wang, T. H., Maalouf, A., Rosman, G., Karaman, S., & Rus, D. (2025). Probing Multimodal LLMs as World Models for Driving. IEEE Robotics and Automation Letters, 10(11), 11403–11410. https://doi.org/10.1109/LRA.2025.3608656
Sreeram, S., T. H. Wang, A. Maalouf, G. Rosman, S. Karaman, and D. Rus. “Probing Multimodal LLMs as World Models for Driving.” IEEE Robotics and Automation Letters 10, no. 11 (January 1, 2025): 11403–10. https://doi.org/10.1109/LRA.2025.3608656.
Sreeram S, Wang TH, Maalouf A, Rosman G, Karaman S, Rus D. Probing Multimodal LLMs as World Models for Driving. IEEE Robotics and Automation Letters. 2025 Jan 1;10(11):11403–10.
Sreeram, S., et al. “Probing Multimodal LLMs as World Models for Driving.” IEEE Robotics and Automation Letters, vol. 10, no. 11, Jan. 2025, pp. 11403–10. Scopus, doi:10.1109/LRA.2025.3608656.
Sreeram S, Wang TH, Maalouf A, Rosman G, Karaman S, Rus D. Probing Multimodal LLMs as World Models for Driving. IEEE Robotics and Automation Letters. 2025 Jan 1;10(11):11403–11410.

Published In

IEEE Robotics and Automation Letters

DOI

EISSN

2377-3766

Publication Date

January 1, 2025

Volume

10

Issue

11

Start / End Page

11403 / 11410

Related Subject Headings

  • 4602 Artificial intelligence
  • 4007 Control engineering, mechatronics and robotics
  • 0913 Mechanical Engineering