Effect of data imbalance in Machine Learning Models for building energy performance prediction

Seraj, Hamidreza, Bahadori-Jahromi, Ali ORCID: https://orcid.org/0000-0003-0405-7146 and Tahayori, Hooman (2024) Effect of data imbalance in Machine Learning Models for building energy performance prediction. In: 2nd International Conference of Artificial Intelligence and Software Engineering,, 24-26 December, 2024, Shiraz, Iran.

[thumbnail of Effect of Data Imbalance in Machine Learning Models for Building Energy Performance Prediction.pdf]
Preview
PDF
Effect of Data Imbalance in Machine Learning Models for Building Energy Performance Prediction.pdf - Accepted Version

Download (382kB) | Preview

Abstract

One of the promising methods that has recently gained attention for investigating building energy performance is utilisation of AI data-driven approaches, such as machine learning (ML) models. These methods, despite their advantages over physics-based models—such as faster prediction time and simplicity of application to case study buildings—have challenges during the model development stage, including the need for large datasets, potential overfitting, and the difficulty of capturing complex physical interactions within the building energy systems. As a result, this research aims to address data imbalance issue in developing a ML model for predicting the Energy Performance Certificate (EPC) rating of residential buildings. On this context, two resampling methods, including SMOTE (Synthetic Minority Over-sampling Technique) and SMOTE-Tomek were applied to XGBoost ML model to improve its accuracy. The results of this study showed that although applying data resampling methods slightly reduced the model’s overall accuracy score (by less than 2%), it significantly enhanced the model's ability to predict minority classes. Specifically, the model's performance in predicting labels B, F, and E improved by more than 7%, 10%, and 6% points, respectively. This highlights how class imbalance in EPC labels can distort evaluation metrics like accuracy, potentially masking poor performance in minority classes. Addressing this imbalance is crucial for effectively integrating ML models into more advanced AI tools and smart systems for comprehensive building performance analysis.

Item Type: Conference or Workshop Item (Paper)
Identifier: AISOFT02_033
Page Range: pp. 1-5
Identifier: AISOFT02_033
Keywords: Imbalanced data, machine learning, SMOTE, EPC rating, Building energy performance
Subjects: Construction and engineering > Civil and structural engineering
Depositing User: ALI BAHADORI-JAHROMI
Date Deposited: 08 Apr 2025 12:17
Last Modified: 08 Apr 2025 14:00
URI: https://repository.uwl.ac.uk/id/eprint/13423
Sustainable Development Goals: Goal 11: Sustainable Cities and Communities

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item

Menu