Spatiotemporal interest points have been successfully used for the representation of human activities in image sequences. In this part of our research, we propose the use of spatiotemporal salient points, by extending in the temporal direction the information- theoretic salient-feature detector developed of Kadir and Brady. Our goal is to obtain a sparse representation of a human action as a set of spatiotemporal points that correspond to activity-variation peaks. The proposed representation contains the spatiotemporal points at which there are peaks in activity variation such as the edges of a moving object.
The scales at which the entropy achieves local maxima is automatically detected. Each image sequence is then represented as a set of spatiotemporal salient points. We use the chamfer distance as an appropriate distance measure between two representations. In order to deal with different speeds in the execution of the actions and to achieve invariance against the subjects’ scaling, we propose a linear space–time-warping technique that linearly warps any two examples by minimizing their chamfer distance. A simple k-nearest neighbor (kNN) classifier and one based on relevance vector machines (RVMs) are used in order to test the efficiency of the representation. We test the proposed method using real image sequences, where we use aerobic exercises as our test domain.