A Novel 3D Human Action Recognition Framework for Video Content Analysis

Abstract

Understanding the meanings of human actions from 3D skeleton data embedded videos is a new challenge in content-oriented video analysis. In this paper, we propose to incorporate temporal patterns of joint positions with currently popular Long Short-Term Memory (LSTM) based learning to improve both accuracy and robustness. Regarding 3D actions are formed by sub-actions, we first propose Wavelet Temporal Pattern (WTP) to extract representations of temporal patterns for each sub-action by wavelet transform. Then, we define a novel Relation-aware LSTM (R-LSTM) structure to extract features by modeling the long-term spatio-temporal correlation between body parts. Regarding WTP and R-LSTM features as heterogeneous representations for human actions, we next fuse WTP and R-LSTM features by an AutoEncoder network to define a more effective action descriptor for classification. The experimental results on a large scale challenging dataset NTU-RGB+D and several other datasets consisting of UT-Kinect and Florence 3D actions for 3D human action analysis demonstrate the effectiveness of the proposed method.

Publication
24th International Conference on MultiMedia Modeling (MMM 2018)
Yirui Wu
Yirui Wu
Young Professor, CCF Senior Member

My research interests include Computer Vision, Artifical Intelligence, Multimedia Computing and Intelligent Water Conservancy.