On the ability of machine learning methods to discover new scaffolds

Rishi, Jagdev, Thomas, Madsen ORCID: https://orcid.org/0000-0001-9354-0935 and Paul W., Finn (2022) On the ability of machine learning methods to discover new scaffolds. Journal of Molecular Modeling.

Full text not available from this repository.


The recent advances in the application of machine learning to drug discovery have made it a "hot topic" for research, with hundreds of academic groups and companies integrating machine learning into their drug discovery projects. Nevertheless, there remains great uncertainty regarding the most appropriate ways to evaluate the relative performance of these powerful methods against more traditional cheminformatics approaches and many pitfalls remain for the unwary. Early in 2020, researchers at MIT (Stokes, et al. [1]) reported the discovery of a new compound with antibacterial activity, halicin, through the use of a neural network machine learning method. A robust ability to identify novel active chemotypes through computational methods would be very useful. In this study, we have used the publicly available halicin dataset to compare the performance of this method to two other approaches, Mapping of Activity Through Dichotomic Scores (MADS) by Todeschini et al. [2] and Random Matrix Theory (RMT) by Lee et al. [3]. We have investigated the impact of dataset composition and standardization on performance. Our results demonstrate that all three methods are capable of predicting halicin as an active antibacterial compound, but that this result is dependent on the dataset composition, pre-processing and the molecular _ngerprint used. Whilst there is no conclusion for the `best' performance overall, MADS is relatively better compared to the other methods based on general predictive performance. Additionally, the prediction analysis of both MADS and RMT is interpretable. We further investigate the sca_old hopping potential of the methods by modifying the dataset by removal of the _-lactam and uoroquinolone chemotypes. MADS and RMT were able to identify actives in the test set that contained these substructures. This ability arises because of high scoring fragments of the withheld chemotypes that are in common with other active antibiotic classes. However, as there is no mechanistic relationship, the generality of the predictive ability of the methods is doubtful. We have further assessed overall performance as determined by several performance metrics.

Item Type: Article
Subjects: Computing
Depositing User: Thomas Madsen
Date Deposited: 10 Apr 2022 09:27
Last Modified: 06 Feb 2024 16:09
URI: https://repository.uwl.ac.uk/id/eprint/8916

Actions (login required)

View Item View Item