Cross-Modal Integrative Feature Network for Sketch-based 3D Shape Retrieval
This paper proposes a novel neural network architecture dubbed Cross-Modal Integrative Feature Network (CMIFN) to address three challenges on sketch-based 3D shape retrieval. Firstly, existing methods, like those based on multiview CNNs, mostly capture surface visual features, ignoring internal geometry features. CMIFN integrates both multi-view and geometry features of 3D objects, consequently extracting a comprehensive global feature. Secondly, existing methods often manipulate sketches to enhance them, which may introduce superfluous data. Utilising an attention mechanism, CMIFN keeps redundancy in check while achieving a more accurate sketch representation. Thirdly, existing methods often compare the distance between sketches and 3D shapes in the same feature space without considering their inherent differences, which can lead to suboptimal retrieval results. CMIFN introduces a modality-weighted classifier module, which assigns different weights to features from different modalities, creating a shared feature space to minimize the gap between similar objects across modalities thus increase the retrieval accuracy. Our comprehensive experiments have demonstrated CMIFN's state-of-the-art performance on benchmark datasets.