A hybrid deep learning approach for deepfake detection using spatial and temporal features with attention mechanisms
Author(s): Jaspreet Singh, Madan Lal and Kanwal Preet Singh Attwal
Abstract: Deepfake attacks threaten the authenticity of digital media, requiring strong detection methodologies to counter them. We therefore propose a deepfake detection system in which EfficientNetB3 acts as a spatial feature extractor, BiLSTM allows for temporal sequence modeling, and the self-attention mechanism creates attention on discriminative frames. The method is tested against the highly challenging Celeb-DF dataset, in which it achieves an accuracy of 85% on the test split. This also suggests that the proposed method successfully captures spatial and temporal discrepancies inside deepfake videos and therefore, is a viable candidate to analyze high-quality synthesized content. Early stop has been applied to prevent the model from overfitting the training data and enhance generalization to unseen data. The future aims of this research are to improve the robustness of the face detector and explore multimodal approaches to improve the inference accuracy further.
Jaspreet Singh, Madan Lal, Kanwal Preet Singh Attwal. A hybrid deep learning approach for deepfake detection using spatial and temporal features with attention mechanisms. Int J Eng Comput Sci 2025;7(2):30-37. DOI: 10.33545/26633582.2025.v7.i2a.196