Skip to main content

Li Ma

Professor of Statistical Science
Statistical Science
Box 90251, Durham, NC 27708-0251
217 Old Chemistry Bldg, Box 90251, Durham, NC 27708-0251

Selected Publications


Efficient in-situ image and video compression through probabilistic image representation

Journal Article Signal Processing · February 1, 2024 Fast and effective image compression for multi-dimensional images has become increasingly important for efficient storage and transfer of massive amounts of high-resolution images and videos. In this paper, we present an efficient in-situ method for multi- ... Full text Cite

Hidden Markov Pólya Trees for High-Dimensional Distributions

Journal Article Journal of the American Statistical Association · January 1, 2024 The Pólya tree (PT) process is a general-purpose Bayesian nonparametric model that has found wide application in a range of inference problems. It has a simple analytic form and the posterior computation boils down to beta-binomial conjugate updates along ... Full text Cite

Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation.

Journal Article Biometrics · September 2023 Mixed-membership (MM) models such as latent Dirichlet allocation (LDA) have been applied to microbiome compositional data to identify latent subcommunities of microbial species. These subcommunities are informative for understanding the biological interpla ... Full text Cite

Controlling taxa abundance improves metatranscriptomics differential analysis.

Journal Article BMC Microbiol · March 7, 2023 BACKGROUND: A common task in analyzing metatranscriptomics data is to identify microbial metabolic pathways with differential RNA abundances across multiple sample groups. With information from paired metagenomics data, some differential methods control fo ... Full text Link to item Cite

Learning Asymmetric and Local Features in Multi-Dimensional Data Through Wavelets With Recursive Partitioning.

Journal Article IEEE transactions on pattern analysis and machine intelligence · November 2022 Effective learning of asymmetric and local features in images and other data observed on multi-dimensional grids is a challenging objective critical for a wide range of image processing applications involving biomedical and natural images. It requires meth ... Full text Cite

Multi-scale Fisher's independence test for multivariate dependence.

Journal Article Biometrika · September 2022 Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it ... Full text Cite

DIRICHLET-TREE MULTINOMIAL MIXTURES FOR CLUSTERING MICROBIOME COMPOSITIONS.

Journal Article The annals of applied statistics · September 2022 Studying the human microbiome has gained substantial interest in recent years, and a common task in the analysis of these data is to cluster microbiome compositions into subtypes. This subdivision of samples into subgroups serves as an intermediary step in ... Full text Cite

Profiling the quantitative occupancy of myriad transcription factors across conditions by modeling chromatin accessibility data.

Journal Article Genome Res · June 2022 Over a thousand different transcription factors (TFs) bind with varying occupancy across the human genome. Chromatin immunoprecipitation (ChIP) can assay occupancy genome-wide, but only one TF at a time, limiting our ability to comprehensively observe the ... Full text Link to item Cite

Updating Urinary Microbiome Analyses to Enhance Biologic Interpretation.

Journal Article Front Cell Infect Microbiol · 2022 OBJECTIVE: An approach for assessing the urinary microbiome is 16S rRNA gene sequencing, where analysis methods are rapidly evolving. This re-analysis of an existing dataset aimed to determine whether updated bioinformatic and statistical techniques affect ... Full text Link to item Cite

The Urinary Microbiome in Postmenopausal Women with Recurrent Urinary Tract Infections.

Journal Article The Journal of urology · November 2021 PurposeThe etiology of postmenopausal recurrent urinary tract infection (UTI) is not completely known, but the urinary microbiome is thought to be implicated. We compared the urinary microbiome in menopausal women with recurrent UTIs to age-matche ... Full text Cite

Chlorhexidine Gluconate Bathing Reduces the Incidence of Bloodstream Infections in Adults Undergoing Inpatient Hematopoietic Cell Transplantation.

Journal Article Transplant Cell Ther · March 2021 Bloodstream infections (BSIs) occur in 20% to 45% of inpatient autologous and allogeneic hematopoietic cell transplant (HCT) patients. Daily bathing with the antiseptic chlorhexidine gluconate (CHG) has been shown to reduce the incidence of BSIs in critica ... Full text Link to item Cite

A phase 2 trial of the somatostatin analog pasireotide to prevent GI toxicity and acute GVHD in allogeneic hematopoietic stem cell transplant.

Journal Article PLoS One · 2021 BACKGROUND: Allogeneic hematopoietic stem cell transplantation (HCT) is an often curative intent treatment, however it is associated with significant gastrointestinal (GI) toxicity and treatment related mortality. Graft-versus-host disease is a significant ... Full text Link to item Cite

Bayesian Graphical Compositional Regression for Microbiome Data

Journal Article Journal of the American Statistical Association · April 2, 2020 An important task in microbiome studies is to test the existence of and give characterization to differences in the microbiome composition across groups of samples. Important challenges of this problem include the large within-group heterogeneities among s ... Full text Cite

A Bayesian hierarchical model for related densities by using Pólya trees

Journal Article Journal of the Royal Statistical Society. Series B: Statistical Methodology · February 1, 2020 Bayesian hierarchical models are used to share information between related samples and to obtain more accurate estimates of sample level parameters, common structure and variation between samples. When the parameter of interest is the distribution or densi ... Full text Cite

CARP: Compression through Adaptive Recursive Partitioning for Multi-Dimensional Images

Conference Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition · January 1, 2020 Fast and effective image compression for multi-dimensional images has become increasingly important for efficient storage and transfer of massive amounts of high resolution images and videos. Desirable properties in compression methods include (1) high rec ... Full text Cite

Fisher Exact Scanning for Dependency

Journal Article Journal of the American Statistical Association · January 2, 2019 We introduce a method—called Fisher exact scanning (FES)—for testing and identifying variable dependency that generalizes Fisher’s exact test on 2 × 2 contingency tables to R × C contingency tables and continuous sample spaces. FES proceeds through scannin ... Full text Cite

Efficient in-situ image and video compression through probabilistic image representation

Journal Article Signal Processing · February 1, 2024 Fast and effective image compression for multi-dimensional images has become increasingly important for efficient storage and transfer of massive amounts of high-resolution images and videos. In this paper, we present an efficient in-situ method for multi- ... Full text Cite

Hidden Markov Pólya Trees for High-Dimensional Distributions

Journal Article Journal of the American Statistical Association · January 1, 2024 The Pólya tree (PT) process is a general-purpose Bayesian nonparametric model that has found wide application in a range of inference problems. It has a simple analytic form and the posterior computation boils down to beta-binomial conjugate updates along ... Full text Cite

Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation.

Journal Article Biometrics · September 2023 Mixed-membership (MM) models such as latent Dirichlet allocation (LDA) have been applied to microbiome compositional data to identify latent subcommunities of microbial species. These subcommunities are informative for understanding the biological interpla ... Full text Cite

Controlling taxa abundance improves metatranscriptomics differential analysis.

Journal Article BMC Microbiol · March 7, 2023 BACKGROUND: A common task in analyzing metatranscriptomics data is to identify microbial metabolic pathways with differential RNA abundances across multiple sample groups. With information from paired metagenomics data, some differential methods control fo ... Full text Link to item Cite

Learning Asymmetric and Local Features in Multi-Dimensional Data Through Wavelets With Recursive Partitioning.

Journal Article IEEE transactions on pattern analysis and machine intelligence · November 2022 Effective learning of asymmetric and local features in images and other data observed on multi-dimensional grids is a challenging objective critical for a wide range of image processing applications involving biomedical and natural images. It requires meth ... Full text Cite

Multi-scale Fisher's independence test for multivariate dependence.

Journal Article Biometrika · September 2022 Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it ... Full text Cite

DIRICHLET-TREE MULTINOMIAL MIXTURES FOR CLUSTERING MICROBIOME COMPOSITIONS.

Journal Article The annals of applied statistics · September 2022 Studying the human microbiome has gained substantial interest in recent years, and a common task in the analysis of these data is to cluster microbiome compositions into subtypes. This subdivision of samples into subgroups serves as an intermediary step in ... Full text Cite

Profiling the quantitative occupancy of myriad transcription factors across conditions by modeling chromatin accessibility data.

Journal Article Genome Res · June 2022 Over a thousand different transcription factors (TFs) bind with varying occupancy across the human genome. Chromatin immunoprecipitation (ChIP) can assay occupancy genome-wide, but only one TF at a time, limiting our ability to comprehensively observe the ... Full text Link to item Cite

Updating Urinary Microbiome Analyses to Enhance Biologic Interpretation.

Journal Article Front Cell Infect Microbiol · 2022 OBJECTIVE: An approach for assessing the urinary microbiome is 16S rRNA gene sequencing, where analysis methods are rapidly evolving. This re-analysis of an existing dataset aimed to determine whether updated bioinformatic and statistical techniques affect ... Full text Link to item Cite

The Urinary Microbiome in Postmenopausal Women with Recurrent Urinary Tract Infections.

Journal Article The Journal of urology · November 2021 PurposeThe etiology of postmenopausal recurrent urinary tract infection (UTI) is not completely known, but the urinary microbiome is thought to be implicated. We compared the urinary microbiome in menopausal women with recurrent UTIs to age-matche ... Full text Cite

Chlorhexidine Gluconate Bathing Reduces the Incidence of Bloodstream Infections in Adults Undergoing Inpatient Hematopoietic Cell Transplantation.

Journal Article Transplant Cell Ther · March 2021 Bloodstream infections (BSIs) occur in 20% to 45% of inpatient autologous and allogeneic hematopoietic cell transplant (HCT) patients. Daily bathing with the antiseptic chlorhexidine gluconate (CHG) has been shown to reduce the incidence of BSIs in critica ... Full text Link to item Cite

A phase 2 trial of the somatostatin analog pasireotide to prevent GI toxicity and acute GVHD in allogeneic hematopoietic stem cell transplant.

Journal Article PLoS One · 2021 BACKGROUND: Allogeneic hematopoietic stem cell transplantation (HCT) is an often curative intent treatment, however it is associated with significant gastrointestinal (GI) toxicity and treatment related mortality. Graft-versus-host disease is a significant ... Full text Link to item Cite

Bayesian Graphical Compositional Regression for Microbiome Data

Journal Article Journal of the American Statistical Association · April 2, 2020 An important task in microbiome studies is to test the existence of and give characterization to differences in the microbiome composition across groups of samples. Important challenges of this problem include the large within-group heterogeneities among s ... Full text Cite

A Bayesian hierarchical model for related densities by using Pólya trees

Journal Article Journal of the Royal Statistical Society. Series B: Statistical Methodology · February 1, 2020 Bayesian hierarchical models are used to share information between related samples and to obtain more accurate estimates of sample level parameters, common structure and variation between samples. When the parameter of interest is the distribution or densi ... Full text Cite

CARP: Compression through Adaptive Recursive Partitioning for Multi-Dimensional Images

Conference Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition · January 1, 2020 Fast and effective image compression for multi-dimensional images has become increasingly important for efficient storage and transfer of massive amounts of high resolution images and videos. Desirable properties in compression methods include (1) high rec ... Full text Cite

Fisher Exact Scanning for Dependency

Journal Article Journal of the American Statistical Association · January 2, 2019 We introduce a method—called Fisher exact scanning (FES)—for testing and identifying variable dependency that generalizes Fisher’s exact test on 2 × 2 contingency tables to R × C contingency tables and continuous sample spaces. FES proceeds through scannin ... Full text Cite

Mixture modeling on related samples by ψ-stick breaking and kernel perturbation

Journal Article Bayesian Analysis · January 1, 2019 There has been great interest recently in applying nonparametric kernel mixtures in a hierarchical manner to model multiple related data samples jointly. In such settings several data features are commonly present: (i) the related samples often share some, ... Full text Cite

Analysis of Distributional Variation Through Graphical Multi-Scale Beta-Binomial Models

Journal Article Journal of Computational and Graphical Statistics · July 3, 2018 Many scientific studies involve comparing multiple datasets collected under different conditions to identify the difference in the underlying distributions. A common challenge in these multi-sample comparison problems is the presence of overdispersion, or ... Full text Cite

A phylogenetic scan test on a dirichlet-tree multinomial model for microbiome data

Journal Article Annals of Applied Statistics · March 1, 2018 In this paper, we introduce the phylogenetic scan test (PhyloScan) for investigating cross-group differences in microbiome compositions using the Dirichlet-tree multinomial (DTM) model. DTM models the microbiome data through a cascade of independent local ... Full text Cite

Probabilistic multi-resolution scanning for two-sample differences

Journal Article Journal of the Royal Statistical Society. Series B: Statistical Methodology · March 1, 2017 We propose a multi-resolution scanning approach to identifying two-sample differences. Windows of multiple scales are constructed through nested dyadic partitioning on the sample space and a hypothesis regarding the two-sample difference is defined on each ... Full text Cite

Efficient functional ANOVA through wavelet-domain Markov groves

Journal Article Journal of the American Statistical Association · 2017 Link to item Cite

Adaptive Shrinkage in Pólya Tree Type Models

Journal Article Bayesian Analysis · September 2016 Full text Cite

Scalable Bayesian Model Averaging Through Local Information Propagation

Journal Article Journal of the American Statistical Association · April 3, 2015 This article shows that a probabilistic version of the classical forward-stepwise variable inclusion procedure can serve as a general data-augmentation scheme for model space distributions in (generalized) linear models. This latent variable representation ... Full text Cite

Adaptive testing of conditional association through recursive mixture modeling

Journal Article Journal of the American Statistical Association · January 1, 2013 In many case-control studies, a central goal is to test for association or dependence between the predictors and the response. Relevant covariates must be conditioned on to avoid false positives and loss in power. Conditioning on covariates is easy in para ... Full text Cite

A sparse transmission disequilibrium test for haplotypes based on Bradley-Terry graphs.

Journal Article Human heredity · January 2012 BackgroundLinkage and association analysis based on haplotype transmission disequilibrium can be more informative than single marker analysis. Several works have been proposed in recent years to extend the transmission disequilibrium test (TDT) to ... Full text Cite

Coupling optional pólya trees and the two sample problem

Journal Article Journal of the American Statistical Association · December 1, 2011 Testing and characterizing the difference between two data samples is of fundamental interest in statistics. Existing methods such as Kolmogorov-Smirnov and Cramer-vonMises tests do not scale well as the dimensionality increases and provide no easy way to ... Full text Cite

A method for unbiased estimation of population abundance along curvy margins

Journal Article Environmetrics · May 1, 2011 Estimating species abundance via transects and quadrats has the advantage over other methods (such as mark-recapture) that they can be less expensive and do not require handling the animals. Transect-quadrat sampling along habitat boundaries with complex g ... Full text Cite

A four group cross-over design for measuring irreversible treatments on web search tasks

Journal Article Proceedings of the Annual Hawaii International Conference on System Sciences · March 28, 2011 When trying to measure the effect of irreversible treatments such as training interventions, the choice of the experimental design can be difficult. A two group cross-over experimental design cannot be used due to longitudinal effects during the course of ... Full text Cite

An "almost exhaustive" search-based sequential permutation method for detecting epistasis in disease association studies.

Journal Article Genetic epidemiology · July 2010 Due to the complex nature of common diseases, their etiology is likely to involve "uncommon but strong" (UBS) interactive effects--i.e. allelic combinations that are each present in only a small fraction of the patients but associated with high disease ris ... Full text Cite

Optional Pólya tree and Bayesian inference

Journal Article Annals of Statistics · June 1, 2010 We introduce an extension of the Pólya tree approach for constructing distributions on the space of probability measures. By using optional stopping and optional choice of splitting variables, the construction gives rise to random measures that are absolut ... Full text Cite