Skip to main content

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Publication ,  Conference
Wang, TH; Maalouf, A; Xiao, W; Ban, Y; Amini, A; Rosman, G; Karaman, S; Rus, D
Published in: Proceedings IEEE International Conference on Robotics and Automation
January 1, 2024

As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems. We introduce a method to extract nuanced spatial features from transformers and the incorporation of latent space simulation for improved training and policy debugging. We use pixel/patch-aligned feature descriptors to expand foundational model capabilities to create an end-to-end multimodal driving model, demonstrating unparalleled results in diverse tests. Our solution combines language with visual perception and achieves significantly greater robustness on out-of-distribution situations.

Duke Scholars

Published In

Proceedings IEEE International Conference on Robotics and Automation

DOI

ISSN

1050-4729

Publication Date

January 1, 2024

Start / End Page

6687 / 6694
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wang, T. H., Maalouf, A., Xiao, W., Ban, Y., Amini, A., Rosman, G., … Rus, D. (2024). Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models. In Proceedings IEEE International Conference on Robotics and Automation (pp. 6687–6694). https://doi.org/10.1109/ICRA57147.2024.10611590
Wang, T. H., A. Maalouf, W. Xiao, Y. Ban, A. Amini, G. Rosman, S. Karaman, and D. Rus. “Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models.” In Proceedings IEEE International Conference on Robotics and Automation, 6687–94, 2024. https://doi.org/10.1109/ICRA57147.2024.10611590.
Wang TH, Maalouf A, Xiao W, Ban Y, Amini A, Rosman G, et al. Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models. In: Proceedings IEEE International Conference on Robotics and Automation. 2024. p. 6687–94.
Wang, T. H., et al. “Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models.” Proceedings IEEE International Conference on Robotics and Automation, 2024, pp. 6687–94. Scopus, doi:10.1109/ICRA57147.2024.10611590.
Wang TH, Maalouf A, Xiao W, Ban Y, Amini A, Rosman G, Karaman S, Rus D. Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models. Proceedings IEEE International Conference on Robotics and Automation. 2024. p. 6687–6694.

Published In

Proceedings IEEE International Conference on Robotics and Automation

DOI

ISSN

1050-4729

Publication Date

January 1, 2024

Start / End Page

6687 / 6694