Cross-modal Assisted Training for Abnormal Event Recognition in Elevators
Given that very few action recognition datasets collected in elevators contain multimodal data, we collect and propose our multimodal dataset investigating passenger safety and inappropriate elevator usage. Moreover, we present a novel framework (RGBP) to utilize multimodal data to enhance unimodal test performance for the task of abnormal event recognition in elevators. Experimental results show that the best network architecture with the RGBP framework effectively improves the unimodal inference performance on the Elevator RGBD dataset by 4.71% (accuracy) and 4.95% (F1 score) with respect to the pure RGB model. In addition, our RGBP framework outperforms two other methods for "multimodal training and unimodal inference": MTUT [1] and the two-stage method based on depth estimation.