Deep Long-Term Feature Extraction for Video Classification

A. Hamedooni Asli ¹ ( Jahad-e-Daneshgahi Institute of Higher Education, Hamedan, Iran )
Sh. Javidani ² ( Jahad-e-Daneshgahi Institute of Higher Education, Hamedan, Iran )
َA. Javidani ³ ( Department of Computer Engineering, Faculty of Engineering, Bu-Ali Sina University, Hamedan, Iran )

Submited date : 2025-07-07 Accepted date : 2025-10-11

Keywords: Video Classification, Human Action Recognition, Deep Learning, Convolutional Neural Networks, Recurrent Neural Networks, Long-Short Term Memory (LSTM),

Abstract :

This paper presents a novel approach for recognizing ongoing actions from segmented videos, with the main focus on extracting long-term features for effective classification. First, optical-flow images between consecutive frames are computed and described by a pretrained convolutional neural network. To reduce feature-space complexity and simplify training of the temporal model, PCA is applied to the optical-flow descriptors. Next, a lightweight channel-attention module is applied to the low-dimensional PCA features at each time step to enhance informative components and suppress weak ones. The descriptors of each video are then aligned and followed over time, forming a multi-channel 1D time series from which long-term patterns are learned using a two-layer stacked LSTM. After the LSTM, a temporal-attention module performs time-aware aggregation by weighting informative time steps to produce a coherent context vector for classification. Experiments show that combining PCA with channel and temporal attention improves accuracy on the public UCF11 and jHMDB datasets while keeping the model lightweight and outperforming reference methods. The code is available at: https://github.com/alijavidani/Video_Classification_LSTM

References: