Assessing the performance of different variations of ensembled tree models in chlorophyll concentration prediction

Nelson; Jimmy  Tjen; Genrawan  Hoendarto

doi:10.18535/sshj.v9i01.1461

Nelson ⁽¹⁾ , Jimmy Tjen ⁽²⁾ , Genrawan Hoendarto ⁽³⁾

(1) Deparment of Informatics, Universitas Widya Dharma Pontianak, Indonesia, 78117 , Indonesia

(2) Deparment of Informatics, Universitas Widya Dharma Pontianak, Indonesia, 78117 , Indonesia

(3) Deparment of Informatics, Universitas Widya Dharma Pontianak, Indonesia, 78117 , Indonesia

DOI:

https://doi.org/10.18535/sshj.v9i01.1461

Issue
Vol. 9 No. 01 (2025)

Published
2025-01-03

Keywords:

Ensembled Tree, Chlorophyll Concentration, Performance analysis, Prediction

PDF

Abstract

Severe microalgae blooming is detrimental towards human life and the aquatic ecosystem in which it is blooming uncontrollably in. Chlorophyll concentration is a common parameter used to predict microalgal bloom. In this study, a variety of ensembled tree models which consists of random forest, gradient boosting frameworks, specifically XGBoost and LightGBM, and extra trees were implemented to predict amounts of chlorophyll concentration that can be found in running water. A comparison was also made between the models to find which performs the best in prediction and computational time. The comparison was conducted by comparing the NRMSE of each model and the average computing time. Each of the model’s hyperparameters has been tuned with the help of random search, as a method for hyper-parameter optimization. The results were as such: random forest took 16.95 ms to compute and the result of the NRMSE was 0.75, XGBoost took 8.28 ms to compute and the result of the NRMSE was 0.71, LightGBM took 2.81 ms to compute and the result of the NRMSE was 0.63, and extra trees took 17.15 ms to compute and the result of the NRMSE was 0.72. The comparison showed that both of the gradient boosting based frameworks performed better compared to both random forest and extra trees. Specifically, LightGBM performed the best in terms of both predictive performance and computational time. The results of this study serves as a purpose to find a faster alternative with similar or better accuracy compared to random forest as a baseline in predicting chlorophyll concentration.

References

[1] G. M. Hallegraeff, D. M. Anderson, A. D. Cembella and H. O. Enevoldsen, Manual on harmful marine microalgae, Unesco, 2004.
[2] Y. Park, K. H. Cho, J. Park, S. M. Cha and J. H. Kim, "Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea," Science of the Total Environment, vol. 502, pp. 31-41, 2015.
[3] Q. V. Ly, X. C. Nguyen, N. C. Le, T.-D. Truong, T.-H. T. Hoang, T. J. Park, T. Maqbool, J. Pyo, K. H. Cho, K.-S. Lee and others, "Application of Machine Learning for eutrophication analysis and algal bloom prediction in an urban river: A 10-year study of the Han River, South Korea," Science of The Total Environment, vol. 797, p. 149040, 2021.
[4] X. Li, J. Sha and Z.-L. Wang, "Application of feature selection and regression models for chlorophyll-a prediction in a shallow lake," Environmental Science and Pollution Research, vol. 25, pp. 19488-19498, 2018.
[5] "Department for Environment Food & Rural Affairs," [Online]. Available: https://environment.data.gov.uk/water-quality/view/download/new. [Accessed 29 July 2024].
[6] L. Breiman, J. Friedman, C. J. Stone and R. A. Olshen, Classification and Regression Trees, Taylor & Francis, 1984
[7] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001.
[8] T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016.
[9] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye and T.-Y. Liu, "Lightgbm: A highly efficient gradient boosting decision tree," Advances in neural information processing systems, vol. 30, 2017.
[10] P. Geurts, D. Ernst and L. Wehenkel, "Extremely randomized trees," Machine learning, vol. 63, pp. 3-42, 2006.
[11] L. Yang and A. Shami, "On hyperparameter optimization of machine learning algorithms: Theory and practice," Neurocomputing, vol. 415, pp. 295-316, 2020.
[12] J. Bergstra and Y. Bengio, "Random search for hyper-parameter optimization.," Journal of machine learning research, vol. 13, no. 2, pp. 281-305, 2012.

Authors

Nelson

Deparment of Informatics, Universitas Widya Dharma Pontianak, Indonesia, 78117

Jimmy Tjen

Deparment of Informatics, Universitas Widya Dharma Pontianak, Indonesia, 78117

Genrawan Hoendarto

Deparment of Informatics, Universitas Widya Dharma Pontianak, Indonesia, 78117

[1]

“Assessing the performance of different variations of ensembled tree models in chlorophyll concentration prediction”, Soc. sci. humanities j., vol. 9, no. 01, pp. 6401–6409, Jan. 2025, doi: 10.18535/sshj.v9i01.1461.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Assessing the performance of different variations of ensembled tree models in chlorophyll concentration prediction

DOI:

Abstract

References

Authors

Address

Contact Info

##plugins.themes.novelty.article.sidebar##

Abstract

References

Authors