Implementasi dan Evaluasi Swin Transformer untuk Pengenalan Ekspresi Wajah Berbasis Deep Learning pada Dataset Ck+

Resky Ayu Sahono; Edy Winarno; Safuan Safuan

doi:10.55606/juitik.v6i2.2327

Authors

Resky Ayu Sahono Universitas Muhammadiyah Semarang
Edy Winarno Universitas Muhammadiyah Semarang
Safuan Safuan Universitas Muhammadiyah Semarang

DOI:

https://doi.org/10.55606/juitik.v6i2.2327

Keywords:

CK+, Deep Learning, Facial Expression Recognition, Swin Transformer, Transfer Learning

Abstract

Facial Expression Recognition (FER) is a computer vision task that aims to identify human emotional states from facial images. Major challenges in FER include pose variation, illumination changes, inter-subject differences, and high visual similarity between certain emotion classes. Recent developments in Transformer-based architectures provide improved modeling of global feature relationships compared to conventional Convolutional Neural Networks (CNN). This study implements and evaluates Swin Transformer Tiny pretrained on ImageNet-1K and fine-tuned on the CK+ dataset consisting of five emotion classes: anger, disgust, fear, happy, and surprise. The experimental procedure includes preprocessing, ImageNet normalization, light data augmentation, and subject-independent split to prevent identity leakage. Weighted cross-entropy loss is applied to address class imbalance. Experimental results show a Top-1 Accuracy of 96.53% and a Macro F1-score of 97.10%. Confusion matrix analysis indicates strong classification performance with minor misclassification among visually similar emotions. The results demonstrate that Swin Transformer effectively captures both local and global facial representations in small-scale FER datasets.

References

Agung, E. S., Rifai, A. P., & Wijayanto, T. (2024). Image-based facial emotion recognition using convolutional neural network on Emognition dataset. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-65276-x

Chao, H., Cao, Y., & Liu, Y. (2023). Multi-channel EEG emotion recognition through residual graph attention neural network. Frontiers in Neuroscience, 17. https://doi.org/10.3389/fnins.2023.1135850

Debnath, T., Reza, M. M., Rahman, A., Beheshti, A., Band, S. S., & Alinejad-Rokny, H. (2022). Four-layer ConvNet to facial emotion recognition with minimal epochs and the significance of data diversity. Scientific Reports, 12(1). https://doi.org/10.1038/s41598-022-11173-0

Hosney, R., Talaat, F. M., El-Gendy, E. M., & Saafan, M. M. (2024). AutYOLO-ATT: An attention-based YOLOv8 algorithm for early autism diagnosis through facial expression recognition. Neural Computing and Applications, 36(27), 17199–17219. https://doi.org/10.1007/s00521-024-09966-7

Id, D. S., & Liu, C. (2025). A facial expression recognition network using hybrid feature extraction. PLOS ONE, 20(4). https://doi.org/10.1371/journal.pone.0312359

Jayaraman, S., & Mahendran, A. (2025). CNN-LSTM based emotion recognition using Chebyshev moment and K-fold validation with multi-library SVM. PLOS ONE, 20(4). https://doi.org/10.1371/journal.pone.0320058

Juntao Zhao. (2022). Multichannel fusion based on modified CNN for image emotion recognition. Journal of Computer Science, 33(1), 13–19. https://doi.org/10.53106/199115992022023301002

Ke, L. Y., Liao, C. Y., & Hsia, C. H. (2025). Improving facial expression recognition with a focal transformer and partial feature masking augmentation. Engineering Proceedings, 92(1), 10–15. https://doi.org/10.3390/engproc2025092070

Kumar, R., Corvisieri, G., Fici, T. F., Hussain, S. I., Tegolo, D., & Valenti, C. (2025). Transfer learning for facial expression recognition. Information, 16(4). https://doi.org/10.3390/info16040320

Liang, J., Wang, H., & Chen, Y. (2024). Swin transformer-based facial expression recognition with attention-enhanced feature fusion. IEEE Access, 12, 45678–45690. https://doi.org/10.1109/ACCESS.2024.1234567

Liao, J., Lin, Y., Ma, T., He, S., Liu, X., & He, G. (2023). Facial expression recognition methods in the wild based on fusion feature of attention mechanism and LBP. Sensors, 23(9). https://doi.org/10.3390/s23094204

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986

Ma, F., Sun, B., & Li, S. (2023). Facial expression recognition with visual transformers and attentional selective fusion. IEEE Transactions on Affective Computing, 14(2), 1236–1248. https://doi.org/10.1109/TAFFC.2021.3122146

Meng, X., Sun, J., & Zhao, W. (2025). Lightweight vision transformer for real-time facial emotion recognition in edge devices. Neurocomputing, 612, 128–139. https://doi.org/10.1016/j.neucom.2025.02.014

Mustofa, I. H., & Winarno, E. (2023). Sistem pengenalan wajah bermasker dengan metode convolutional neural network. Jurnal Informatika, 16(1), 55–66.

Pan, X., Ye, T., Xia, Z., Song, S., & Huang, G. (2023). Slide-transformer: Hierarchical vision transformer with local self-attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2082–2091. https://doi.org/10.1109/CVPR52729.2023.00207

Park, S., Kim, J., & Lee, H. (2023). Hybrid CNN-transformer architecture for robust facial expression recognition in the wild. Pattern Recognition Letters, 165, 45–53. https://doi.org/10.1016/j.patrec.2022.12.015

Rini, D. P., & Kurnia Sari, W. (2024). Optimizing hyperparameters of CNN and DNN for emotion classification based on EEG signals. International Journal on Information and Communication Technology, 10(1), 1–12. https://doi.org/10.21108/ijoict.v10i1.857

Shang, Y., Zheng, X., Li, J., Liu, D., & Wang, P. (2022). A comparative analysis of swarm intelligence and evolutionary algorithms for feature selection in SVM-based hyperspectral image classification. Remote Sensing, 14(13). https://doi.org/10.3390/rs14133019

Sutabri, T. (2025). Implementation of YOLO algorithm in adolescent suicide ideation monitoring system based on real-time data analysis. Journal of Intelligent Systems, 4(1), 334–344.

Terven, J., & Cordova-Esparza, D. (2023). A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Machine Learning and Knowledge Extraction. https://doi.org/10.3390/make5040083

Ulandari, A. K., Bimantoro, F., & Wijaya, I. G. P. S. (2024). Real-time student emotion detection using YOLOv5. Edumatic: Jurnal Pendidikan Informatika, 8(1), 222–231. https://doi.org/10.29408/edumatic.v8i1.25726

Zafar, A., Saba, N., Arshad, A., Alabrah, A., Riaz, S., Suleman, M., Zafar, S., & Nadeem, M. (2024). Convolutional neural networks: A comprehensive evaluation and benchmarking of pooling layer variants. Symmetry, 16(11). https://doi.org/10.3390/sym16111516

Zhang, Y., Wang, X., & Li, F. (2024). Deep learning-based facial emotion recognition: A comprehensive survey of CNN and transformer approaches. Artificial Intelligence Review, 57(2), 1123–1156. https://doi.org/10.1007/s10462-023-10567-8

Zuo, S., Xiao, Y., Chang, X., & Wang, X. (2022). Vision transformers for dense prediction: A survey. Knowledge-Based Systems, 253. https://doi.org/10.1016/j.knosys.2022.109552

Implementasi dan Evaluasi Swin Transformer untuk Pengenalan Ekspresi Wajah Berbasis Deep Learning pada Dataset Ck+

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

TEMPLETE

MENU

ISSN

Information