Seraj, Hamidreza, Bahadori-Jahromi, Ali ORCID: https://orcid.org/0000-0003-0405-7146 and Tahayori, Hooman
(2024)
Effect of data imbalance in Machine Learning Models for building energy performance prediction.
In: 2nd International Conference of Artificial Intelligence and Software Engineering,, 24-26 December, 2024, Shiraz, Iran.
Preview |
PDF
Effect of Data Imbalance in Machine Learning Models for Building Energy Performance Prediction.pdf - Accepted Version Download (382kB) | Preview |
Abstract
One of the promising methods that has recently gained attention for investigating building energy performance is utilisation of AI data-driven approaches, such as machine learning (ML) models. These methods, despite their advantages over physics-based models—such as faster prediction time and simplicity of application to case study buildings—have challenges during the model development stage, including the need for large datasets, potential overfitting, and the difficulty of capturing complex physical interactions within the building energy systems. As a result, this research aims to address data imbalance issue in developing a ML model for predicting the Energy Performance Certificate (EPC) rating of residential buildings. On this context, two resampling methods, including SMOTE (Synthetic Minority Over-sampling Technique) and SMOTE-Tomek were applied to XGBoost ML model to improve its accuracy. The results of this study showed that although applying data resampling methods slightly reduced the model’s overall accuracy score (by less than 2%), it significantly enhanced the model's ability to predict minority classes. Specifically, the model's performance in predicting labels B, F, and E improved by more than 7%, 10%, and 6% points, respectively. This highlights how class imbalance in EPC labels can distort evaluation metrics like accuracy, potentially masking poor performance in minority classes. Addressing this imbalance is crucial for effectively integrating ML models into more advanced AI tools and smart systems for comprehensive building performance analysis.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Identifier: | AISOFT02_033 |
Page Range: | pp. 1-5 |
Identifier: | AISOFT02_033 |
Keywords: | Imbalanced data, machine learning, SMOTE, EPC rating, Building energy performance |
Subjects: | Construction and engineering > Civil and structural engineering |
Depositing User: | ALI BAHADORI-JAHROMI |
Date Deposited: | 08 Apr 2025 12:17 |
Last Modified: | 08 Apr 2025 14:00 |
URI: | https://repository.uwl.ac.uk/id/eprint/13423 | Sustainable Development Goals: | Goal 11: Sustainable Cities and Communities |
Downloads
Downloads per month over past year
Actions (login required)
![]() |
View Item |