Ivanova, Mariya, Russo, Nicola, Mihaylov, Gueorgui and Konstantin, Nikolic ORCID: https://orcid.org/0000-0002-6551-2977
(2025)
Leveraging 13C NMR spectrum data derived from SMILES for machine learning-based prediction of a small biomolecule functionality: a case study on human Dopamine D1 receptor antagonists.
Advance Intelligent Discovery.
(Submitted)
Preview |
PDF (PDF/A)
Leveraging 13C NMR spectrum data derived from SMILES for machine learning-based prediction of a small biomolecule functionality_preprint_IvanovaM_accessible.pdf - Submitted Version Download (1MB) | Preview |
Abstract
This study contributes to ongoing research which aims to predict small biomolecule functionality using Carbon-13 Nuclear Magnetic Resonance (13C NMR) spectrum data and machine learning. A bioassay on human dopamine D1 receptor antagonists was used to demonstrate the approach. The Simplified Molecular Input Line Entry System (SMILES) notations of the compounds were extracted and converted into spectroscopic data using purpose-built software. This data was then used for machine learning with scikit-learn algorithms. The ML models were trained with 27,756 samples and tested with 5,466. Of the estimators tested (K-Nearest neighbor, Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier, XGBoost Classifier, and Support Vector Classifier), the Support Vector Classifier was found to be the most effective, achieving 71.5% accuracy and a cross-validation score of 0.749. The methodology can be applied to predict the functionality of any compound, provided relevant data are available. It was also hypothesized that an increase in sample numbers would lead to increased accuracy. Additionally, a time- and cost-efficient CID_SID ML model was also developed, allowing compounds to be checked for D1 receptor antagonist activity using only their PubChem identifiers. This model's metrics were 80.2% accuracy and a five-fold cross-validation score of 0.8071.
Item Type: | Article |
---|---|
Keywords: | scikit-learn, drug development, drug discovery, CID-SID ML model, neurotransmitter |
Subjects: | Computing > Intelligent systems |
Depositing User: | Mariya Ivanova |
Date Deposited: | 17 Sep 2025 13:06 |
Last Modified: | 17 Sep 2025 13:45 |
URI: | https://repository.uwl.ac.uk/id/eprint/14071 |
Downloads
Downloads per month over past year
Actions (login required)
![]() |
View Item |