Overview
My research focuses on developing new tools for probabilistic learning from complex data - methods development is directly motivated by challenging applications in ecology/biodiversity, neuroscience, environmental health, criminal justice/fairness, and more. We seek to develop new modeling frameworks, algorithms and corresponding code that can be used routinely by scientists and decision makers. We are also interested in new inference framework and in studying theoretical properties of methods we develop.
Some highlight application areas:
(1) Modeling of biological communities and biodiversity - we are considering global data on fungi, insects, birds and animals including DNA sequences, images, audio, etc. Data contain large numbers of species unknown to science and we would like to learn about these new species, community network structure, and the impact of environmental change and climate.
(2) Brain connectomics - based on high resolution imaging data of the human brain, we are seeking to developing new statistical and machine learning models for relating brain networks to human traits and diseases.
(3) Environmental health & mixtures - we are building tools for relating chemical and other exposures (air pollution etc) to human health outcomes, accounting for spatial dependence in both exposures and disease. This includes an emphasis on infectious disease modeling, such as COVID-19.
Some statistical areas that play a prominent role in our methods development include models for low-dimensional structure in data (latent factors, clustering, geometric and manifold learning), flexible/nonparametric models (neural networks, Gaussian/spatial processes, other stochastic processes), Bayesian inference frameworks, efficient sampling and analytic approximation algorithms, and models for "object data" (trees, networks, images, spatial processes, etc).
Some highlight application areas:
(1) Modeling of biological communities and biodiversity - we are considering global data on fungi, insects, birds and animals including DNA sequences, images, audio, etc. Data contain large numbers of species unknown to science and we would like to learn about these new species, community network structure, and the impact of environmental change and climate.
(2) Brain connectomics - based on high resolution imaging data of the human brain, we are seeking to developing new statistical and machine learning models for relating brain networks to human traits and diseases.
(3) Environmental health & mixtures - we are building tools for relating chemical and other exposures (air pollution etc) to human health outcomes, accounting for spatial dependence in both exposures and disease. This includes an emphasis on infectious disease modeling, such as COVID-19.
Some statistical areas that play a prominent role in our methods development include models for low-dimensional structure in data (latent factors, clustering, geometric and manifold learning), flexible/nonparametric models (neural networks, Gaussian/spatial processes, other stochastic processes), Bayesian inference frameworks, efficient sampling and analytic approximation algorithms, and models for "object data" (trees, networks, images, spatial processes, etc).
Current Appointments & Affiliations
Arts and Sciences Distinguished Professor of Statistical Science
·
2013 - Present
Statistical Science,
Trinity College of Arts & Sciences
Professor of Statistical Science
·
2008 - Present
Statistical Science,
Trinity College of Arts & Sciences
Professor in the Department of Mathematics
·
2014 - Present
Mathematics,
Trinity College of Arts & Sciences
Faculty Network Member of the Duke Institute for Brain Sciences
·
2011 - Present
Duke Institute for Brain Sciences,
University Institutes and Centers
Recent Publications
A digital twin for real-time biodiversity forecasting with citizen science data.
Journal Article Nature ecology & evolution · March 2026 Citizen science provides large amounts of biodiversity data. Key challenges in unlocking its full potential include engaging citizens with limited species identification skills and accelerating the transition from data collection to research and monitoring ... Full text CiteBayesian Nonparametric Modeling of Latent Partitions via Stirling-Gamma Priors
Journal Article Bayesian Analysis · March 1, 2026 Dirichlet process mixtures are particularly sensitive to the value of the precision parameter controlling the behavior of the latent partition. Randomization of the precision through a prior distribution is a common solution, which leads to more robust inf ... Full text CiteBag of DAGs: Inferring Directional Dependence in Spatiotemporal Processes
Journal Article Bayesian Analysis · March 1, 2026 We propose a class of nonstationary processes to characterize space-and time-varying directional associations in point-referenced data. We are motivated by spatiotemporal modeling of air pollutants in which local wind patterns are key determinants of the p ... Full text CiteRecent Grants
Inferring Binary Feature Profiles Underlying Patient Health, with Applications to Sepsis
ResearchCo-Principal Investigator · Awarded by National Institutes of Health · 2026 - 2030Duke University Program in Environmental Health
Inst. Training Prgm or CMEMentor · Awarded by National Institute of Environmental Health Sciences · 2019 - 2029Improving inferences on health effects of chemical exposures
ResearchPrincipal Investigator · Awarded by National Institute of Environmental Health Sciences · 2023 - 2028View All Grants
Education
Emory University ·
1997
Ph.D.
Pennsylvania State University ·
1994
B.S.