Perbandingan Algoritma Machine Learning (Logistic Regression, SVM, KNN, Decision Tree, Random Forest, dan Gradient Boosting) dalam Prediksi Hujan Harian di Provinsi Lampung

Authors

  • Ayu Aprilia Universitas Lampung
  • Alka Budi Wahidin Institut Teknologi Sumatera
  • Syafriadi Syafriadi Universitas Lampung
  • Pulung Karo Karo Universitas Lampung

DOI:

https://doi.org/10.55606/juisik.v5i3.1901

Keywords:

Ensamble Learning, Machine Learning, Meteorological Data, Rainfall Prediction, Satellite Validation

Abstract

In this era, rainfall prediction has become important for meteorological analysis, natural resource management, and hydrometeorological disaster mitigation. The development of rainfall prediction is aligned with modern predictive models, such as popular Machine Learning (ML) algorithms. Given these two phenomena, this study aims to evaluate six ML algorithms: Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Gradient Boosting Machine (GBM). Rainfall prediction combines data from three sources, namely ERA5, NASA POWER, and BMKG Lampung. The data was taken from three BMKG Lampung observation stations (Bandar Lampung, Pesawaran, and Kotabumi) for the period 2020-2025. Performance evaluation was conducted by assessing accuracy, precision, recall, F1-Score, confusion matrix, ROC curve, and AUC. The results of the study show that Random Forest achieves the best performance across all aspects, with a UAC value of 0.715. AUC value indicates that RF has stable predictions for very complex rainfall cases. The Decision Tree model showed low accuracy (0.66) and AUC values of 0.579. Ensemble-based multiple trees are more effective for predictive performance. These results highlight the importance of integrating various meteorological datasets from multiple sources using advanced ensemble learning methods to improve rainfall prediction accuracy and support climate-related decision-making at the regional level.

References

Aprilia, A., Wahidin, A. B., & Abdurrahman, A. F. (2025). Integration of machine learning and NASA POWER dataset for predicting coffee production in Lampung. Jurnal Fisika Flux: Jurnal Ilmiah Fisika FMIPA Universitas Lambung Mangkurat, 22(1), 44. https://doi.org/10.20527/flux.v22i1.20980

Aprilia, A., Wahidin, A. B., & Syafriadi. (2025). Seismic activity analysis in Indonesia: Integrating machine learning, geospatial data, and environmental factors for risk assessment. JPF (Jurnal Pendidikan Fisika) Universitas Islam Negeri Alauddin Makassar, 13(1), 12–26. https://doi.org/10.24252/jpf.v13i1.54081

Aprilia, A., Wahidin, A. B., Abdurrahman, A. F., Prihanto, S., & Al-Y. (2025). Comparative validation of NASA POWER and ERA5 satellite-based meteorological data using BMKG observations in Bandar Lampung, Indonesia. InfoSains, 15(2), 360–369. https://doi.org/10.54209/infosains.v15i02

Aprilia, A., Zakiya, H., Pauzi, G. A., & Supriyanto, A. (2025). Integration of Magnus thermodynamic parameters and machine learning algorithms in rainfall prediction. Jurnal Ilmiah, 11(2), 82–91.

Bell, B., Hersbach, H., Simmons, A., Berrisford, P., Dahlgren, P., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Radu, R., Schepers, D., Soci, C., Villaume, S., Bidlot, J. R., Haimberger, L., Woollen, J., Buontempo, C., & Thépaut, J. N. (2021). The ERA5 global reanalysis: Preliminary extension to 1950. Quarterly Journal of the Royal Meteorological Society, 147(741), 4186–4227. https://doi.org/10.1002/qj.4174

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Bukhari, S., & Mohamed, S. (2015). Applications of applied scientific research. Journal of Advanced Applied Scientific Research, 5(1), 68–83.

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451

Handayani, H. H., Lestari, S. A. P., Cahyana, Y., & Karawang, P. (2025). Evaluasi kinerja algoritma random forest dan gradient boosting untuk klasifikasi penyakit jantung. KOMTIKA, 9(1). https://doi.org/10.31603/komtika.v9i1.13450

Holton, J. R. (1992). An introduction to dynamic meteorology (Vol. 48). Academic Press. https://doi.org/10.1016/B978-0-12-354355-4.50005-X

Hoo, Z. H., Candlish, J., & Teare, D. (2017). What is an ROC curve? Emergency Medicine Journal, 34(6), 357–359. https://doi.org/10.1136/emermed-2017-206735

Hosmer, D. W., Jr., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). John Wiley & Sons. https://doi.org/10.1002/9781118548387

Huang, S., Cai, N., Pacheco, P. P., Narandes, S., Wang, Y., & Xu, W. (2018). Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics and Proteomics, 15(1), 41–51. https://doi.org/10.21873/cgp.20063

LaValley, M. P. (2008). Logistic regression. Circulation, 117(18), 2395–2399. https://doi.org/10.1161/CIRCULATIONAHA.106.682658

Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2), 1883. https://doi.org/10.4249/scholarpedia.1883

Ritonga, A., Al Amini, A., Mutianda, L., Singarimbun, R., Baeha, A. H., Pasaribu, G. R., & Damanik, J. A. D. (2025). Analisis probabilitas hujan menggunakan data historis dari BMKG Wilayah I tahun 2013–2015. Jurnal Riset Rumpun Matematika dan Ilmu Pengetahuan Alam, 4(1), 1–20. https://doi.org/10.55606/jurrimipa.v4i1.4367

Shaharudin, S. M. (2020). Imputation methods for addressing missing data of monthly rainfall in Yogyakarta, Indonesia. International Journal of Advanced Trends in Computer Science and Engineering, 9(1.4), 646–651. https://doi.org/10.30534/ijatcse/2020/9091.42020

Sutanto, T., Aditya, M. R., Budiman, H., Noor Ridha, M. R., Syapotro, U., & Azijah, N. (2024). Comparison of logistic regression, random forest, SVM, and KNN algorithm for water quality classification based on contaminant parameters. INTI Journal, 2022(1). https://doi.org/10.61453/jods.v2023no48

Tan, M. L., Armanuos, A. M., Ahmadianfar, I., Demir, V., Heddam, S., Al-Areeq, A. M., Abba, S. I., Halder, B., Kilinc, H. C., & Yaseen, Z. M. (2023). Evaluation of NASA POWER and ERA5-Land for estimating tropical precipitation and temperature extremes. Journal of Hydrology, 624, 129940. https://doi.org/10.1016/j.jhydrol.2023.129940

Usman, C. D., Widodo, A. P., Adi, K., & Gernowo, R. (2023). Rainfall prediction model in Semarang City using machine learning. Indonesian Journal of Electrical Engineering and Computer Science, 30(2), 1224–1231. https://doi.org/10.11591/ijeecs.v30.i2.pp1224-1231

Vapnik, V. (2013). The nature of statistical learning theory. Springer.

Wallace, J. M., & Hobbs, P. V. (2006). Atmospheric science: An introductory survey (2nd ed., Vol. 92). Elsevier.

Wangwongchai, A., Waqas, M., Dechpichai, P., Hlaing, P. T., Ahmad, S., & Humphries, U. W. (2023). Imputation of missing daily rainfall data: A comparison between artificial intelligence and statistical techniques. MethodsX, 11, 102459. https://doi.org/10.1016/j.mex.2023.102459

Downloads

Published

2025-11-30

How to Cite

Ayu Aprilia, Alka Budi Wahidin, Syafriadi Syafriadi, & Pulung Karo Karo. (2025). Perbandingan Algoritma Machine Learning (Logistic Regression, SVM, KNN, Decision Tree, Random Forest, dan Gradient Boosting) dalam Prediksi Hujan Harian di Provinsi Lampung . Jurnal Ilmiah Sistem Informasi Dan Ilmu Komputer, 5(3), 755–764. https://doi.org/10.55606/juisik.v5i3.1901

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.