Combating small-molecule aggregation with machine learning
Biological screens are plagued by false-positive hits resulting from aggregation. Methods to triage small colloidally aggregating molecules (SCAMs) are in high demand. Herein, we disclose a neural network to flag such entities. Our data demonstrate the utility of machine learning for predicting SCAMs, achieving 80% of correct predictions in an out-of-sample evaluation. The tool is competitive with a panel of expert chemists, who correctly predict 61% ± 7% of the same molecules in a Turing-like test. Our computational routine provides insight into features governing aggregation that had remained hidden to expert intuition. Further, we quantify that up to 15%–20% of ligands in publicly available chemogenomic databases have high potential to aggregate at a typical screening concentration (30 μM), imposing caution in systems biology and drug design programs. Our approach provides a means to augment human intuition and mitigate attrition and a pathway to accelerate future molecular medicine.