Ivanova, Mariya, Russo, Nicola, Mihaylov, Gueorgui and Konstantin, Nikolic ORCID: https://orcid.org/0000-0002-6551-2977
(2025)
Leveraging Machine Learning and IUPAC names to identify TDP1 inhibitors.
Computational and Structural Biotechnology.
(Submitted)
![]() |
PDF (PDF/A)
Leveraging Machine Learning and IUPAC names to identify TDP1 inhibitors_preprint_IvanovaM_accessible.pdf - Submitted Version Restricted to Repository staff only Download (1MB) |
Abstract
This paper introduces a series of computational approaches to assist drug developers in the discovery of new inhibitors for human tyrosyl-DNA phosphodiesterase 1 (TDP1), using a PubChem bioassay as a case study. The methodologies are underpinned by a custom dataset, generated by tokenizing the IUPAC names of compounds into over 5,000 distinct functional group fragments, with labels derived from the initial TDP1 inhibition data. First, a RandomForestClassifier (RFC) model was developed to predict whether a compound was a TDP1 inhibitor. Trained on more than 94,000 samples, the model demonstrated a strong performance with an accuracy of 70.9% and an ROC score of 70.8%. Building on this, two subsequent approaches provided deeper insights into the structural characteristics influencing inhibition. By reordering the feature importance list based on the proportion of active cases, the research identified the expected effects of specific functional groups. This was further refined by pinpointing the most and least desirable fragments for TDP1 inhibition. A separate, highly efficient CID_SID ML model was also developed, using only compound and substance identifiers from PubChem. This model outperformed the RFC model, achieving a superior accuracy of 85.2% and a precision of 94.2%, demonstrating the potential for rapid and effective screening using simplified input data. Collectively, these methods offer valuable computational tools for accelerating the drug discovery process.
Item Type: | Article |
---|---|
Keywords: | scikit-learn, PubChem, HTS, bioassay, CID_SID ML model |
Subjects: | Computing > Intelligent systems |
Depositing User: | Mariya Ivanova |
Date Deposited: | 17 Sep 2025 14:55 |
Last Modified: | 17 Sep 2025 15:00 |
URI: | https://repository.uwl.ac.uk/id/eprint/14074 |
Actions (login required)
![]() |
View Item |