Overview
My research focuses on developing new tools for probabilistic learning from complex data - methods development is directly motivated by challenging applications in ecology/biodiversity, neuroscience, environmental health, criminal justice/fairness, and more. We seek to develop new modeling frameworks, algorithms and corresponding code that can be used routinely by scientists and decision makers. We are also interested in new inference framework and in studying theoretical properties of methods we develop.
Some highlight application areas:
(1) Modeling of biological communities and biodiversity - we are considering global data on fungi, insects, birds and animals including DNA sequences, images, audio, etc. Data contain large numbers of species unknown to science and we would like to learn about these new species, community network structure, and the impact of environmental change and climate.
(2) Brain connectomics - based on high resolution imaging data of the human brain, we are seeking to developing new statistical and machine learning models for relating brain networks to human traits and diseases.
(3) Environmental health & mixtures - we are building tools for relating chemical and other exposures (air pollution etc) to human health outcomes, accounting for spatial dependence in both exposures and disease. This includes an emphasis on infectious disease modeling, such as COVID-19.
Some statistical areas that play a prominent role in our methods development include models for low-dimensional structure in data (latent factors, clustering, geometric and manifold learning), flexible/nonparametric models (neural networks, Gaussian/spatial processes, other stochastic processes), Bayesian inference frameworks, efficient sampling and analytic approximation algorithms, and models for "object data" (trees, networks, images, spatial processes, etc).
Some highlight application areas:
(1) Modeling of biological communities and biodiversity - we are considering global data on fungi, insects, birds and animals including DNA sequences, images, audio, etc. Data contain large numbers of species unknown to science and we would like to learn about these new species, community network structure, and the impact of environmental change and climate.
(2) Brain connectomics - based on high resolution imaging data of the human brain, we are seeking to developing new statistical and machine learning models for relating brain networks to human traits and diseases.
(3) Environmental health & mixtures - we are building tools for relating chemical and other exposures (air pollution etc) to human health outcomes, accounting for spatial dependence in both exposures and disease. This includes an emphasis on infectious disease modeling, such as COVID-19.
Some statistical areas that play a prominent role in our methods development include models for low-dimensional structure in data (latent factors, clustering, geometric and manifold learning), flexible/nonparametric models (neural networks, Gaussian/spatial processes, other stochastic processes), Bayesian inference frameworks, efficient sampling and analytic approximation algorithms, and models for "object data" (trees, networks, images, spatial processes, etc).
Current Appointments & Affiliations
Arts and Sciences Distinguished Professor of Statistical Science
·
2013 - Present
Statistical Science,
Trinity College of Arts & Sciences
Professor of Statistical Science
·
2008 - Present
Statistical Science,
Trinity College of Arts & Sciences
Professor in the Department of Mathematics
·
2014 - Present
Mathematics,
Trinity College of Arts & Sciences
Faculty Network Member of the Duke Institute for Brain Sciences
·
2011 - Present
Duke Institute for Brain Sciences,
University Institutes and Centers
Recent Publications
Common to rare transfer learning (CORAL) enables inference and prediction for a quarter million rare Malagasy arthropods.
Journal Article Nature methods · October 2025 DNA-based biodiversity surveys result in massive-scale data, including up to millions of species-of which, most are rare. Making the most of such data for inference and prediction requires modeling approaches that can relate species occurrences to environm ... Full text CiteGraph neural networks and cortical column modeling for AI-based brain age prediction in Alzheimer’s disease risk
Conference Proceedings of SPIE the International Society for Optical Engineering · September 17, 2025 Alzheimer’s disease (AD) affects over 10% of people above age 65. Current treatments remain largely ineffective, thus early biomarkers are essential for devising preventive interventions, and personalizing these based on risk profiles. Brain age gap (BAG)— ... Full text CiteBAYESIAN LEARNING OF CLINICALLY MEANINGFUL SEPSIS PHENOTYPES IN NORTHERN TANZANIA.
Journal Article Ann Appl Stat · September 2025 Sepsis is a life-threatening condition caused by a dysregulated host response to infection. Recently, researchers have hypothesized that sepsis consists of a heterogeneous spectrum of distinct subtypes, motivating several studies to identify clusters of se ... Full text Link to item CiteRecent Grants
Duke University Program in Environmental Health
Inst. Training Prgm or CMEMentor · Awarded by National Institute of Environmental Health Sciences · 2019 - 2029Improving inferences on health effects of chemical exposures
ResearchPrincipal Investigator · Awarded by National Institute of Environmental Health Sciences · 2023 - 2028R01: Genetic Origins of Adverse Outcomes in African Americans with Lymphoma
ResearchCo Investigator · Awarded by National Institutes of Health · 2023 - 2028View All Grants
Education, Training & Certifications
Emory University ·
1997
Ph.D.
Pennsylvania State University ·
1994
B.S.