Overview
My research focuses on developing new tools for probabilistic learning from complex data - methods development is directly motivated by challenging applications in ecology/biodiversity, neuroscience, environmental health, criminal justice/fairness, and more. We seek to develop new modeling frameworks, algorithms and corresponding code that can be used routinely by scientists and decision makers. We are also interested in new inference framework and in studying theoretical properties of methods we develop.
Some highlight application areas:
(1) Modeling of biological communities and biodiversity - we are considering global data on fungi, insects, birds and animals including DNA sequences, images, audio, etc. Data contain large numbers of species unknown to science and we would like to learn about these new species, community network structure, and the impact of environmental change and climate.
(2) Brain connectomics - based on high resolution imaging data of the human brain, we are seeking to developing new statistical and machine learning models for relating brain networks to human traits and diseases.
(3) Environmental health & mixtures - we are building tools for relating chemical and other exposures (air pollution etc) to human health outcomes, accounting for spatial dependence in both exposures and disease. This includes an emphasis on infectious disease modeling, such as COVID-19.
Some statistical areas that play a prominent role in our methods development include models for low-dimensional structure in data (latent factors, clustering, geometric and manifold learning), flexible/nonparametric models (neural networks, Gaussian/spatial processes, other stochastic processes), Bayesian inference frameworks, efficient sampling and analytic approximation algorithms, and models for "object data" (trees, networks, images, spatial processes, etc).
Some highlight application areas:
(1) Modeling of biological communities and biodiversity - we are considering global data on fungi, insects, birds and animals including DNA sequences, images, audio, etc. Data contain large numbers of species unknown to science and we would like to learn about these new species, community network structure, and the impact of environmental change and climate.
(2) Brain connectomics - based on high resolution imaging data of the human brain, we are seeking to developing new statistical and machine learning models for relating brain networks to human traits and diseases.
(3) Environmental health & mixtures - we are building tools for relating chemical and other exposures (air pollution etc) to human health outcomes, accounting for spatial dependence in both exposures and disease. This includes an emphasis on infectious disease modeling, such as COVID-19.
Some statistical areas that play a prominent role in our methods development include models for low-dimensional structure in data (latent factors, clustering, geometric and manifold learning), flexible/nonparametric models (neural networks, Gaussian/spatial processes, other stochastic processes), Bayesian inference frameworks, efficient sampling and analytic approximation algorithms, and models for "object data" (trees, networks, images, spatial processes, etc).
Current Appointments & Affiliations
Arts and Sciences Distinguished Professor of Statistical Science
·
2013 - Present
Statistical Science,
Trinity College of Arts & Sciences
Professor of Statistical Science
·
2008 - Present
Statistical Science,
Trinity College of Arts & Sciences
Professor in the Department of Mathematics
·
2014 - Present
Mathematics,
Trinity College of Arts & Sciences
Faculty Network Member of the Duke Institute for Brain Sciences
·
2011 - Present
Duke Institute for Brain Sciences,
University Institutes and Centers
Recent Publications
Accelerated algorithms for convex and non-convex optimization on manifolds
Journal Article Machine Learning · March 1, 2025 We propose a general scheme for solving convex and non-convex optimization problems on manifolds. The central idea is that, by adding a multiple of the squared retraction distance to the objective function in question, we “convexify” the objective function ... Full text CiteBayesian Clustering via Fusing of Localized Densities
Journal Article Journal of the American Statistical Association · January 1, 2025 Bayesian clustering typically relies on mixture models, with each component interpreted as a different cluster. After defining a prior for the component parameters and weights, Markov chain Monte Carlo (MCMC) algorithms are commonly used to produce samples ... Full text CiteRadial neighbours for provably accurate scalable approximations of Gaussian processes
Journal Article Biometrika · December 1, 2024 In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable O(n) computational complexity. In these models, data at each location are typically assumed conditionally d ... Full text CiteRecent Grants
Duke University Program in Environmental Health
Inst. Training Prgm or CMEMentor · Awarded by National Institutes of Health · 2019 - 2029Improving inferences on health effects of chemical exposures
ResearchPrincipal Investigator · Awarded by National Institute of Environmental Health Sciences · 2023 - 2028R01: Genetic Origins of Adverse Outcomes in African Americans with Lymphoma
ResearchCo Investigator · Awarded by National Institutes of Health · 2023 - 2028View All Grants
Education, Training & Certifications
Emory University ·
1997
Ph.D.
Pennsylvania State University ·
1994
B.S.