Scholars@Duke publication: Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection

Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection

Publication , Conference

Li, X; Shi, B; Hou, Y; Wu, X; Ma, T; Li, Y; He, L

Published in: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

January 1, 2022

Published version (DOI)

Multi-modal 3D object detection has been an active research topic in autonomous driving. Nevertheless, it is non-trivial to explore the cross-modal feature fusion between sparse 3D points and dense 2D pixels. Recent approaches either fuse the image features with the point cloud features that are projected onto the 2D image plane or combine the sparse point cloud with dense image pixels. These fusion approaches often suffer from severe information loss, thus causing sub-optimal performance. To address these problems, we construct the homogeneous structure between the point cloud and images to avoid projective information loss by transforming the camera features into the LiDAR 3D space. In this paper, we propose a homogeneous multi-modal feature fusion and interaction method (HMFI) for 3D object detection. Specifically, we first design an image voxel lifter module (IVLM) to lift 2D image features into the 3D space and generate homogeneous image voxel features. Then, we fuse the voxelized point cloud features with the image features from different regions by introducing the self-attention based query fusion mechanism (QFM). Next, we propose a voxel feature interaction module (VFIM) to enforce the consistency of semantic information from identical objects in the homogeneous point cloud and image voxel representations, which can provide object-level alignment guidance for cross-modal feature fusion and strengthen the discriminative ability in complex backgrounds. We conduct extensive experiments on the KITTI and Waymo Open Dataset, and the proposed HMFI achieves better performance compared with the state-of-the-art multi-modal methods. Particularly, for the 3D detection of cyclist on the KITTI benchmark, HMFI surpasses all the published algorithms by a large margin.

Duke Scholars

Author Xin Li Electrical and Computer Engineering

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

10.1007/978-3-031-19839-7_40

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2022

Volume

13698 LNCS

Start / End Page

691 / 707

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, X., Shi, B., Hou, Y., Wu, X., Ma, T., Li, Y., & He, L. (2022). Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection. In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics (Vol. 13698 LNCS, pp. 691–707). https://doi.org/10.1007/978-3-031-19839-7_40

Li, X., B. Shi, Y. Hou, X. Wu, T. Ma, Y. Li, and L. He. “Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection.” In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 13698 LNCS:691–707, 2022. https://doi.org/10.1007/978-3-031-19839-7_40.

Li X, Shi B, Hou Y, Wu X, Ma T, Li Y, et al. Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection. In: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2022. p. 691–707.

Li, X., et al. “Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection.” Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, vol. 13698 LNCS, 2022, pp. 691–707. Scopus, doi:10.1007/978-3-031-19839-7_40.

Li X, Shi B, Hou Y, Wu X, Ma T, Li Y, He L. Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection. Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2022. p. 691–707.

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

10.1007/978-3-031-19839-7_40

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2022

Volume

13698 LNCS

Start / End Page

691 / 707

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences