Volumetric spatial feature representation for view-invariant human action recognition using a depth camera

Journal Article (Journal Article)

The problem of viewpoint variations is a challenging issue in vision-based human action recognition. With the richer information provided by three-dimensional (3-D) point clouds thanks to the advent of 3-D depth cameras, we can effectively analyze spatial variations in human actions. In this paper, we propose a volumetric spatial feature representation (VSFR) that measures the density of 3-D point clouds for view-invariant human action recognition from depth sequence images. Using VSFR, we construct a self-similarity matrix (SSM) that can graphically represent temporal variations in the depth sequence. To obtain an SSM, we compute the squared Euclidean distance of VSFRs between a pair of frames in a video sequence. In this manner, an SSM represents the dissimilarity between a pair of frames in terms of spatial information in a video sequence captured at an arbitrary viewpoint. Furthermore, due to the use of a bag-of-features method for feature representations, the proposed method efficiently handles the variations of action speed or length. Hence, our method is robust to both variations in viewpoints and lengths of action sequences. We evaluated the proposed method by comparing with state-of-the-art methods in the literature on three public datasets of ACT42, MSRAction3D, and MSRDailyActivity3D, validating the superiority of our method by achieving the highest accuracies.

Full Text

Duke Authors

Cited Authors

  • Cho, SS; Lee, AR; Suk, HI; Park, JS; Lee, SW

Published Date

  • March 1, 2015

Published In

Volume / Issue

  • 54 / 3

Electronic International Standard Serial Number (EISSN)

  • 1560-2303

International Standard Serial Number (ISSN)

  • 0091-3286

Digital Object Identifier (DOI)

  • 10.1117/1.OE.54.3.033102

Citation Source

  • Scopus