LightTrack: A generic framework for online top-down human pose tracking
In this paper, we propose a simple yet effective framework, named LightTrack, for online human pose tracking. Existing methods usually perform human detection, pose estimation and tracking in sequential stages, where pose tracking is regarded as an offline bipartite matching problem. Our proposed framework is designed to be generic, efficient and truly online for top-down approaches. For efficiency, Single-Person Pose Tracking (SPT) and Visual Object Tracking (VOT) are incorporated as a unified on-line functioning entity, easily implemented by a replaceable single-person pose estimator. To mitigate offline optimization costs, the framework also unifies SPT with online identity association and sheds first light upon bridging multi-person keypoint tracking with Multi-Target Object Tracking (MOT). Specifically, we propose a Siamese Graph Convolution Network (SGCN) for human pose matching as a Re-ID module. In contrary to other Re-ID modules, we use a graphical representation of human joints for matching. The skeleton-based representation effectively captures human pose similarity and is computationally inexpensive. It is robust to sudden camera shifts that introduce human drifting. The proposed framework is general enough to fit other pose estimators and candidate matching mechanisms. Extensive experiments show that our method outperforms other online methods and is very competitive with offline state-of-the-art methods while maintaining higher frame rates. Code and models are publicly available at https://github.com/Guanghan/lighttrack.