Learning a CoNCISE Language for Small-Molecule Binding
Rapid advances in deep learning have improved in silico methods for drug-target interaction (DTI) prediction. However, current methods struggle to scale to catalogs listing billions of commercially-available small molecules. Here, we introduce CoNCISE, a method that accelerates DTI prediction by 23 orders of magnitude while maintaining high accuracy. CoNCISE employs a novel vector-quantized codebook approach and residual-learning-based training of hierarchical codes. Strikingly, we find that binding-specificity information in the small molecule space can be compressed into just 15 bits per compound, grouping all small molecules into 32,768 hierarchically-organized binding categories. Our DTI architecture combines these compact ligand representations with fixed-length protein embeddings in a cross-attention framework, achieving state-of-the-art prediction accuracy at unprecedented speed. We demonstrate CoNCISE’s practical utility by indexing 6.4 billion ligands from the Enamine dataset, enabling researchers to query vast chemical libraries against a protein target in seconds.
Duke Scholars
DOI
Publication Date
Volume
Start / End Page
Related Subject Headings
- Artificial Intelligence & Image Processing
- 46 Information and computing sciences
Citation
DOI
Publication Date
Volume
Start / End Page
Related Subject Headings
- Artificial Intelligence & Image Processing
- 46 Information and computing sciences