Mixture models in the exploration of structure-activity relationships in drug design
We report on a study of mixture modeling problems arising in the assessment of chemical structure-activity relationships in drug design and discovery. Pharmaceutical research laboratories developing test compounds for screening synthesize many related candidate compounds by linking together collections of basic molecular building blocks, known as monomers. These compounds are tested for biological activity, feeding in to screening for further analysis and drug design. The tests also provide data relating compound activity to chemical properties and aspects of the structure of associated monomers, and our focus here is studying such relationships as an aid to future monomer selection. The level of chemical activity of compounds is based on the geometry of chemical binding of test compounds to target binding sites on receptor compounds, but the screening tests are unable to identify binding configurations. Hence potentially critical covariate information is missing as a natural latent variable. Resulting statistical models are then mixed with respect to such missing information, so complicating data analysis and inference. This paper reports on a study of a two-monomer, two-binding site framework and associated data. We build structured mixture models that mix linear regression models, predicting chemical effectiveness, with respect to site-binding selection mechanisms. We discuss aspects of modeling and analysis, including problems and pitfalls, and describe results of analyses of a simulated and real data set. In modeling real data, we are led into critical model extensions that introduce hierarchical random effects components to adequately capture heterogeneities in both the site binding mechanisms and in the resulting levels of effectiveness of compounds once bound. Comments on current and potential future directions conclude the report.