Skip to main content

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion

Publication ,  Conference
Li, X; Ma, T; Hou, Y; Shi, B; Yang, Y; Liu, Y; Wu, X; Chen, Q; Li, Y; Qiao, Y; He, L
Published in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
January 1, 2023

LiDAR-camera fusion methods have shown impressive performance in 3D object detection. Recent advanced multi-modal methods mainly perform global fusion, where image features and point cloud features are fused across the whole scene. Such practice lacks fine-grained region-level information, yielding suboptimal fusion performance. In this paper, we present the novel Local-to-Global fusion network (LoGoNet), which performs LiDAR-camerafusion at both local and global levels. Concretely, the Global Fusion (GoF) of LoGoNet is built upon previous literature, while we exclusively use point centroids to more precisely represent the position of voxel features, thus achieving better crossmodal alignment. As to the Local Fusion (LoF), we first divide each proposal into uniform grids and then project these grid centers to the images. The image features around the projected grid points are sampled to be fused with position-decorated point cloud features, maximally uti-lizing the rich contextual information around the proposals. The Feature Dynamic Aggregation (FDA) module is further proposed to achieve information interaction between these locally and globally fused features, thus producing more informative multi-modal features. Extensive experiments on both Waymo Open Dataset (WOD) and KITTI datasets show that LoGoNet outperforms all state-of-the-art 3D detection methods. Notably, LoGoNet ranks 1st on Waymo 3D object detection leaderboard and obtains 81.02 mAPH (L2) detection performance. It is noteworthy that, for the first time, the detection performance on three classes surpasses 80 APH (L2) simultaneously. Code will be available at https://github.com/sankin97/LoGoNet.

Duke Scholars

Published In

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

DOI

ISSN

1063-6919

Publication Date

January 1, 2023

Volume

2023-June

Start / End Page

17524 / 17534
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, X., Ma, T., Hou, Y., Shi, B., Yang, Y., Liu, Y., … He, L. (2023). LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2023-June, pp. 17524–17534). https://doi.org/10.1109/CVPR52729.2023.01681
Li, X., T. Ma, Y. Hou, B. Shi, Y. Yang, Y. Liu, X. Wu, et al. “LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion.” In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023-June:17524–34, 2023. https://doi.org/10.1109/CVPR52729.2023.01681.
Li X, Ma T, Hou Y, Shi B, Yang Y, Liu Y, et al. LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2023. p. 17524–34.
Li, X., et al. “LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2023-June, 2023, pp. 17524–34. Scopus, doi:10.1109/CVPR52729.2023.01681.
Li X, Ma T, Hou Y, Shi B, Yang Y, Liu Y, Wu X, Chen Q, Li Y, Qiao Y, He L. LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2023. p. 17524–17534.

Published In

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

DOI

ISSN

1063-6919

Publication Date

January 1, 2023

Volume

2023-June

Start / End Page

17524 / 17534