DOI QR코드

DOI QR Code

Development of a Machine Learning Model for a Chiller using Random Forest Algorithm and Data Pre-processing

랜덤 포레스트와 데이터 전처리를 이용한 냉동기 기계학습 모델 개발

  • Received : 2017.04.24
  • Accepted : 2017.07.24
  • Published : 2017.09.30

Abstract

It has been widely acknowledged that a machine learning model can be used as a surrogate to a first-principle based dynamic simulation model. The accuracy and computation efficiency of a machine learning model is dependent on a combination of input variables. The random forest algorithm, one of the machine learning algorithms, can calculate a variable importance that determines the influence of each input variable on the output of the model. In this study, the authors developed three random forest models of a chiller in an existing building as follows: (1) Model A consisting of 12 measured variables from BEMS data, (2) Model B consisting of 2 measured input variables plus 4 new variables constructed by random selection, and (3) Model C consisting of 4 measured input variables plus 2 new variables constructed based on a physics-based equation. The CVRMSE of the three models are 8.56%, 5.44%, and 4.28%, respectively. The findings of this study can be summarized threefold: (1) all three random forest models are good enough to describe the dynamics of the chiller system, (2) the random forest machine learning algorithm can be used to develop a simulation model of the system, and (3) an accurate model can be constructed either by the random selection or the physics-based equation, even when a few input variables are given.

Keywords

Acknowledgement

Supported by : 한국에너지기술평가원(KETEP)

References

  1. Abushakra, B. (1997). An inverse model to predict and evaluate the energy performance of large commercial and institutional buildings. In Building Simulation, 3, 403-410.
  2. Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural computation, 9(7), 1545-1588. https://doi.org/10.1162/neco.1997.9.7.1545
  3. ASHRAE (2002). ASHRAE Guideline 14-2002: measurement of energy and demand savings. American Society of Heating, Refrigerating and Air-conditioning Engineers, Atlanta, GA.
  4. Azman, K., & Kocijan, J. (2007). Application of Gaussian processes for black-box modelling of biosystems. ISA transactions, 46(4), 443-457. https://doi.org/10.1016/j.isatra.2007.04.001
  5. Baird, G., Aun, C., Brauder, W., Donn, M. R., & Pool, F. (1984). Energy performance of buildings. CRC Press, Boca Raton, FL.
  6. Bosch, A., Zisserman, A., & Munoz, X. (2007). Image classification using random forests and ferns. In Computer Vision, 2007, 1-8.
  7. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  8. Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446-3453. https://doi.org/10.1016/j.eswa.2011.09.033
  9. Chae, Y. T., Horesh, R., Hwang, Y., & Lee, Y. M. (2016). Artificial neural network model for forecasting sub-hourly electricity usage in commercial buildings. Energy and Buildings, 111, 184-194. https://doi.org/10.1016/j.enbuild.2015.11.045
  10. Gall, J., & Lempitsky, V. (2013). Class-specific hough forests for object detection. In Decision forests for computer vision and medical image analysis(pp. 143-157). Springer, London.
  11. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
  12. Han, E. J. (2005). Screening Test Data Analysis for Cataract Happening Prediction Model using Random forest. Thesis, Yeonsei University, Seoul.
  13. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer-Verlag, New York, NY.
  14. Huang, J., Li, Y.F., & Xie, M. (2015). An empirical analysis of data preprocessing for machine learning-based software cost estimation. Information and Software Technology, 67, 108-127. https://doi.org/10.1016/j.infsof.2015.07.004
  15. IEA, Modernising Building Energy Codes, The International Energy Agency, Paris, France, 2013.
  16. Jekabsons, G. M5'regression tree, model tree, and tree ensemble toolbox for Matlab/Octave ver. 1.6.0.
  17. Kalsyte, Z., & Verikas, A. (2013). A novel approach to exploring company's financial soundness: Investor's perspective. Expert Systems with Applications, 40(13), 5085-5092. https://doi.org/10.1016/j.eswa.2013.03.031
  18. Kim, Y. J., & Park, C. S. (2014). Gaussian Process Model for Real-Time Optimal Control of Chiller System. Journal of the Architectural Institute of Korea: Planning & Design, 30(7), 211-220.
  19. Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Data preprocessing for supervised leaning. International Journal of Computer Science, 1(2), 111-117.
  20. Louppe, G. (2014). Understanding random forests: From theory to practice. Thesis, University of Liege, Liege.
  21. May, R., Dandy, G., & Maier, H. (2011). Review of input variable selection methods for artificial neural networks. INTECH Open Access Publisher, Rijeka.
  22. Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification. Ellis Horwood, New York, NY.
  23. Platon, R., Dehkordi, V. R., & Martel, J. (2015). Hourly prediction of a building's electricity consumption using case-based reasoning, artificial neural networks and principal component analysis. Energy and Buildings, 92, 10-18.
  24. Ristin, M., Guillaumin, M., Gall, J., & Van Gool, L. (2014). Incremental learning of NCM forests for large-scale image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3654-3661.
  25. Schulter, S., Leistner, C., Wohlhart, P., Roth, P. M., & Bischof, H. (2014). Accurate object detection with joint classification-regression random forests. In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 923-930.
  26. Suh, W. J., & Park, C. S. (2012). Issues and Limitations on the Use of a Whole Building Simulation Tool for Energy Diagnosis of a Real-life Building. Journal of the Architectural Institute of Korea Planning & Design, 28(1), 273-283. https://doi.org/10.5659/JAIK_PD.2012.28.1.273
  27. Taegyun, Y., & Gwan-Su, Y. (2008). Application of Random Forest Algorithm for the Decision Support System of Medical Diagnosis with the Selection of Significant Clinical Test. The transactions of The Korean Institute of Electrical Engineers, 57(6), 1058-1062.
  28. Yang, J., Santamouris, M., Lee, S. E., & Deb, C. (2016). Energy performance model development and occupancy number identification of institutional buildings. Energy and Buildings, 123, 192-204. https://doi.org/10.1016/j.enbuild.2015.12.018