Machine Learning - driven insights for predicting the impact of nanoparticles on the functionality of biomolecules, Illustrated by the case of DNA Damage-Inducible Transcript 3 (CHOP) inhibitors

Ivanova, Mariya, Russo, Nicola, Mihaylov, Gueorgui and Konstantin, Nikolic ORCID logoORCID: https://orcid.org/0000-0002-6551-2977 (2025) Machine Learning - driven insights for predicting the impact of nanoparticles on the functionality of biomolecules, Illustrated by the case of DNA Damage-Inducible Transcript 3 (CHOP) inhibitors. IEEE Transactions on Pattern Analysis and Machine Intelligence. ISSN 0162-8828 (Submitted)

[thumbnail of PDF/A]
Preview
PDF (PDF/A)
Machine Learning - driven insights for predicting the impact of nanoparticles_preprint_IvanovaM_accessible.pdf - Submitted Version

Download (1MB) | Preview
[thumbnail of PDF/A]
Preview
PDF (PDF/A)
Appendices_Machine Learning driven insights for predicting_preprint_IvanovaM_accessible.pdf - Supplemental Material

Download (1MB) | Preview

Abstract

. The presented study contributes to ongoing research that aims to overcome challenges in predicting the bio-applicability of nanoparticles (NPs). The approach explored a variety of combinations of nuclear magnetic resonance (NMR) spectroscopy data derived from the Simplified molecular-input line-entry system (SMILES) notations and small biomolecule features. The resulting datasets were utilised for machine learning (ML) with scikit-learn and deep neural networks (DNN) with PyTorch. Despite the obstacles in predicting how NPs influence biomolecule functionalities, the methodology was reasoned in terms of its applicability to compounds both with and without NPs. The methodology was illustrated through a quantitative high-throughput screening (qHTS) aimed at finding DNA Damage-Inducible Transcript 3 (CHOP) inhibitors. Based on this data, the optimal ML performance was achieved by the Random Forest Classifier, which was trained on 19,184 samples and tested on 4,000, resulting in 81.1% accuracy, 83.4% precision, 77.7% recall, 80.4% F1-score, 81.1% ROC, and a five-fold cross-validation score of 0.821. Complementing the main study, two computational approaches were developed to enhance CHOP inhibitor prediction. The first identifies the most desirable/undesirable functional groups for CHOP inhibition. The second, a CID_SID ML model, achieved 90.1% accuracy in predicting whether compounds designed for other purposes possess CHOP inhibition potential.

Item Type: Article
Additional Information: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
Keywords: Scikit learn, PyTorch, SMILES, NMR, CID_SID ML model
Depositing User: Mariya Ivanova
Date Deposited: 18 Sep 2025 09:10
Last Modified: 18 Sep 2025 09:30
URI: https://repository.uwl.ac.uk/id/eprint/14077

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item

Menu