Pengembangan Model Hibrida CRNN dan Tesseract OCR untuk Peningkatan Akurasi Ekstraksi Teks dari Citra Dokumen

Syafrie Abdunnasir Jawad; Edy Winarno; Safuan Safuan; Muhammad Munsarif

doi:10.55606/juisik.v6i2.2306

Authors

Syafrie Abdunnasir Jawad Universitas Muhammadiyah Semarang
Edy Winarno Universitas Muhammadiyah Semarang
Safuan Safuan Universitas Muhammadiyah Semarang
Muhammad Munsarif Universitas Muhammadiyah Semarang

DOI:

https://doi.org/10.55606/juisik.v6i2.2306

Keywords:

Character Error Rate, Crnn, Optical Character Recognition, Receipt Image, Tesseract

Abstract

Text extraction from document images, especially transaction receipts, remains challenging because receipt images often contain small characters, dense layouts, blur, noise, skew, shadows, and irregular illumination. This study develops a Tri-Expert hybrid OCR model that combines CRNN(H=64), MSF-CRNN(H=64), and Tesseract OCR to improve word-level recognition accuracy on receipt images. The proposed system uses calibrated confidence-based selection with temperature scaling, a CRNN(H=48) rule-based arbiter for conflicting predictions, and a numeric-aware Tesseract fallback for low-confidence numeric tokens. Experiments were conducted on word crops from the CORD dataset and evaluated using Exact Match (EM), Word Error Rate (WER), Character Error Rate (CER), and Character F1-score. On the validation set (N=2,186), the hybrid method achieved EM 91.49%, CER 2.37%, and Char-F1 97.96%. On the test set (N=2,356), it achieved EM 89.52%, CER 3.13%, and Char-F1 97.25%, outperforming both CRNN(H=64) and Tesseract. The results indicate that the proposed hybrid design improves OCR reliability, particularly for numeric and short tokens, while remaining modular and practical for web-based deployment.

References

A-Sawaareekun, C., & Lipikorn, R. (2025). Menu item extraction from Thai receipt images using deep learning and template-based information extraction. https://doi.org/10.1145/3704391.3704407

Atienza, R. (2021). Vision transformer for fast and efficient scene text recognition. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). https://doi.org/10.1109/ICDAR52589.2021.00010

Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S. J., & Lee, H. (2019). What is wrong with scene text recognition model comparisons? Dataset and model analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 4715-4723). https://doi.org/10.1109/ICCV.2019.00481

Bautista, D., & Atienza, R. (2022). Scene text recognition with permuted autoregressive sequence models (PARSeq). In European Conference on Computer Vision (ECCV). http://arxiv.org/abs/2207.06966

Dietterich, T. G. (2000). Ensemble methods in machine learning. In Multiple Classifier Systems (pp. 1-15). Springer.

Du, Y., Li, C., Guo, R., Yin, X., Liu, W., Zhou, J., & Bai, X. (2020). PP-OCR: A practical ultra lightweight OCR system. http://arxiv.org/abs/2009.09941

Fang, S., Xie, H., Wang, Y., Mao, Z., & Zhang, Y. (2021). Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 7098-7107).

Graves, A., Fernandez, S., Gomez, F., & Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of ICML 2006 (pp. 369-376).

Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of ICML 2017 (pp. 1321-1330). http://arxiv.org/abs/1706.04599

Hegghammer, T. (2022). OCR with Tesseract, Amazon Textract, and Google Document AI: A benchmarking experiment. Journal of Computational Social Science, 5(1), 861-882. https://doi.org/10.1007/s42001-021-00149-1

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735

Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., & Jawahar, C. V. (2019). ICDAR2019 competition on scanned receipt OCR and information extraction. Proceedings of the International Conference on Document Analysis and Recognition, 1516-1520. https://doi.org/10.1109/ICDAR.2019.00244

Ismail, M., Abu Mangshor, N. N., Fadzil, A., & Ibrahim, S. (2024). Automated receipt scanning using convolutional recurrent neural network (CRNN). https://doi.org/10.1109/SCOReD64708.2024.10872626

Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79-87. https://doi.org/10.1162/neco.1991.3.1.79

Lestari, I. N. T., & Mulyana, D. I. (2022). Implementation of OCR (Optical Character Recognition) using Tesseract in detecting character in quotes text images. Journal of Applied Engineering and Technological Science, 4(1), 58-63. https://doi.org/10.37385/jaets.v4i1.905

Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707-710.

Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D. A. F., Zhang, C., Li, Z., & Wei, F. (2021). TrOCR: Transformer-based optical character recognition with pre-trained models. http://arxiv.org/abs/2109.10282

Lin, C. J., Liu, Y. C., & Lee, C. L. (2022). Automatic receipt recognition system based on artificial intelligence technology. Applied Sciences, 12(2). https://doi.org/10.3390/app12020853

Na, B., Kim, Y., & Park, S. (2022). Multi-modal text recognition networks: Interactive enhancements between visual and semantic features (MATRN). In European Conference on Computer Vision (ECCV).

Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62-66. https://doi.org/10.1109/TSMC.1979.4310076

Park, S., Shin, S., Lee, B., Lee, J., Surh, J., Seo, M., & Lee, H. (2019). CORD: A consolidated receipt dataset for post-OCR parsing. In Document Intelligence Workshop at Neural Information Processing Systems (NeurIPS).

Sauvola, J., & Pietikainen, M. (2000). Adaptive document image binarization. Pattern Recognition, 33(2), 225-236. https://doi.org/10.1016/S0031-3203(99)00055-2

Sayallar, C., Sayar, A., & Babalik, N. (2023). An OCR engine for printed receipt images using deep learning techniques. International Journal of Advanced Computer Science and Applications, 14. https://doi.org/10.14569/IJACSA.2023.0140295

Shi, B., Bai, X., & Yao, C. (2017). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298-2304. https://doi.org/10.1109/TPAMI.2016.2646371

Smith, R. (2007). An overview of the Tesseract OCR engine. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2, 629-633. https://doi.org/10.1109/ICDAR.2007.4376991

Wang, P., Jin, L., Zhang, Y., Zhu, C., Shen, C., & Cao, W. (2021). VisionLAN: Visual language model for scene text recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). http://arxiv.org/abs/2108.09661

Wang, X., Zhang, X., Lei, S., & Deng, H. (2020). A method of text detection and recognition from receipt images based on CRAFT and CRNN. Journal of Physics: Conference Series, 1518, 012053. https://doi.org/10.1088/1742-6596/1518/1/012053

Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259. https://doi.org/10.1016/S0893-6080(05)80023-1

Yu, W., Ibrayim, M., & Hamdulla, A. (2023). Scene text recognition based on improved CRNN. Information, 14(7), 1-14. https://doi.org/10.3390/info14070369

Zou, L., He, Z., Wang, K., Wu, Z., Wang, Y., Zhang, G., & Wang, X. (2023). Text recognition model based on multi-scale fusion CRNN. Sensors, 23(16). https://doi.org/10.3390/s23167034

Pengembangan Model Hibrida CRNN dan Tesseract OCR untuk Peningkatan Akurasi Ekstraksi Teks dari Citra Dokumen

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

MENU