An Improved Xception Model with Multi-scale Feature Fusion and Convolutional Block Attention Module (CBAM) for AI-Driven Facial Expression Recognition
Keywords:
Xception; multi-scale depth separable module; multi-channel; feature fusion; CBAMAbstract
The continuous development of deep learning (DL) has significantly impacted computer vision tasks, particularly with the convolutional neural network (CNN) Xception. In facial expression recognition, the implementation of the deep separable CNN Xception has greatly improved the neural network’s representation ability, thereby improving the facial recognition rates. However, there are two critical issues in facial feature extraction. The first issue is that the single-scale expression features fail to capture the rich facial expression information adequately while the second issue is that expression features are not evenly distributed across the entire face image. The first issue is addressed by designing a multi-scale and multi-channel recognition method to extract these diverse features. Meanwhile, the Convolutional Block Attention Module (CBAM) is incorporated into the multi-scale and multi-channel convolutional neural network. This article chooses Xception as the basic network, uses multi-scale depth separable modules with convolution kernels of 3×3, 5×5 and 7×7 to extract richer expression features, and performs multi-channel convolution feature fusion to add attention mechanism module. The proposed model achieved a recognition accuracy of 94.35% on the dataset FER2013, which is 1.6% higher than unimproved Xcepiton network,and the results were more obvious compared to the other five network models, verifying the effectiveness of the improvement measures.The proposed method demonstrates significant engineering application value and can be widely utilized in areas such as telemedicine, smart education, and autonomous driving.