David B. Dunson CV

Journal Article Nature methods · October 2025 DNA-based biodiversity surveys result in massive-scale data, including up to millions of species-of which, most are rare. Making the most of such data for inference and prediction requires modeling approaches that can relate species occurrences to environm ... Full text Cite

Graph neural networks and cortical column modeling for AI-based brain age prediction in Alzheimer’s disease risk

Conference Proceedings of SPIE the International Society for Optical Engineering · September 17, 2025 Alzheimer’s disease (AD) affects over 10% of people above age 65. Current treatments remain largely ineffective, thus early biomarkers are essential for devising preventive interventions, and personalizing these based on risk profiles. Brain age gap (BAG)— ... Full text Cite

BAYESIAN LEARNING OF CLINICALLY MEANINGFUL SEPSIS PHENOTYPES IN NORTHERN TANZANIA.

Journal Article Ann Appl Stat · September 2025 Sepsis is a life-threatening condition caused by a dysregulated host response to infection. Recently, researchers have hypothesized that sepsis consists of a heterogeneous spectrum of distinct subtypes, motivating several studies to identify clusters of se ... Full text Link to item Cite

Inferring Covariance Structure from Multiple Data Sources via Subspace Factor Analysis.

Journal Article Journal of the American Statistical Association · June 2025 Factor analysis provides a canonical framework for imposing lower-dimensional structure such as sparse covariance in high-dimensional data. High-dimensional data on the same set of variables are often collected under different conditions, for instance in r ... Full text Cite

Nonparametric IPSS: fast, flexible feature selection with false discovery control.

Journal Article Bioinformatics (Oxford, England) · May 2025 MotivationFeature selection is a critical task in machine learning and statistics. However, existing feature selection methods either (i) rely on parametric methods such as linear or generalized linear models, (ii) lack theoretical false discovery ... Full text Open Access Cite

Robustifying Likelihoods by Optimistically Re-weighting Data.

Journal Article Journal of the American Statistical Association · April 2025 Likelihood-based inferences have been remarkably successful in wide-spanning application areas. However, even after due diligence in selecting a good model for the data at hand, there is inevitably some amount of model misspecification: outliers, data cont ... Full text Cite

Product Centred Dirichlet Processes for Bayesian Multiview Clustering.

Journal Article Journal of the Royal Statistical Society. Series B, Statistical methodology · April 2025 While there is an immense literature on Bayesian methods for clustering, the multiview case has received little attention. This problem focuses on obtaining distinct but statistically dependent clusterings in a common set of entities for different data typ ... Full text Cite

Accelerated algorithms for convex and non-convex optimization on manifolds

Journal Article Machine Learning · March 1, 2025 We propose a general scheme for solving convex and non-convex optimization problems on manifolds. The central idea is that, by adding a multiple of the squared retraction distance to the objective function in question, we “convexify” the objective function ... Full text Cite

LOW-RANK LONGITUDINAL FACTOR REGRESSION WITH APPLICATION TO CHEMICAL MIXTURES.

Journal Article The annals of applied statistics · March 2025 Developmental epidemiology commonly focuses on assessing the association between multiple early life exposures and childhood health. Statistical analyses of data from such studies focus on inferring the contributions of individual exposures, while also cha ... Full text Open Access Cite

INFERRING SYNERGISTIC AND ANTAGONISTIC INTERACTIONS IN MIXTURES OF EXPOSURES

Journal Article Annals of Applied Statistics · March 1, 2025 There is abundant interest in assessing the joint effects of multiple exposures on human health. This is often referred to as the mixtures problem in environmental epidemiology and toxicology. Classically, studies have examined the adverse health effects o ... Full text Cite

Modeling Recurrent Failures on Large Directed Networks

Journal Article Journal of the American Statistical Association · January 1, 2025 Many lifeline infrastructure systems consist of thousands of components configured in a complex directed network. Disruption of the infrastructure constitutes a recurrent failure process over a directed network. Statistical inference for such network recur ... Full text Cite

Bayesian Clustering via Fusing of Localized Densities.

Journal Article Journal of the American Statistical Association · January 2025 Bayesian clustering typically relies on mixture models, with each component interpreted as a different cluster. After defining a prior for the component parameters and weights, Markov chain Monte Carlo (MCMC) algorithms are commonly used to produce samples ... Full text Cite

Bayesian inference for generalized linear models via quasi-posteriors.

Journal Article Biometrika · January 2025 Generalized linear models are routinely used for modelling relationships between a response variable and a set of covariates. The simple form of a generalized linear model comes with easy interpretability, but also leads to concerns about model misspecific ... Full text Cite

Exact sampling of spanning trees via fast-forwarded random walks

Journal Article Biometrika · January 1, 2025 Tree graphs are used routinely in statistics. When estimating a Bayesian model with a tree component, sampling the posterior remains a core difficulty. Existing Markov chain Monte Carlo methods tend to rely on local moves, often leading to poor mixing. A p ... Full text Cite

Domain Adaptive Bootstrap Aggregating

Journal Article IEEE Transactions on Signal Processing · January 1, 2025 When there is a distributional shift between data used to train a predictive algorithm and current data, performance can suffer. This is known as the domain adaptation problem. Bootstrap aggregating, or bagging, is a popular method for improving the stabil ... Full text Cite

Infinite joint species distribution models

Journal Article Biometrika · January 1, 2025 Joint species distribution models are popular in ecology for modelling covariate effects on species occurrence, while characterizing cross-species dependence. Data consist of multivariate binary indicators of the occurrences of different species in each sa ... Full text Cite

Factor pretraining in Bayesian multivariate logistic models

Journal Article Biometrika · January 1, 2025 This article focuses on inference in logistic regression for high-dimensional binary outcomes. A popular approach induces dependence across the outcomes by including latent factors in the linear predictor. Bayesian approaches are useful for characterizing ... Full text Cite

Radial Neighbors for Provably Accurate Scalable Approximations of Gaussian Processes.

Journal Article Biometrika · December 2024 In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable O(n) computational complexity. In these models, data at each location are typically assumed conditionally d ... Full text Cite

Brain network fingerprints of Alzheimer's disease risk factors in mouse models with humanized APOE alleles.

Journal Article Magn Reson Imaging · December 2024 Alzheimer's disease (AD) presents complex challenges due to its multifactorial nature, poorly understood etiology, and late detection. The mechanisms through which genetic and modifiable risk factors influence disease susceptibility are under intense inves ... Full text Link to item Cite

Bayesian semiparametric inference in longitudinal metabolomics data.

Journal Article Scientific reports · December 2024 The article is motivated by an application to the EarlyBird cohort study aiming to explore how anthropometrics and clinical and metabolic processes are associated with obesity and glucose control during childhood. There is interest in inferring the relatio ... Full text Cite

SPATIAL PREDICTIONS ON PHYSICALLY CONSTRAINED DOMAINS: APPLICATIONS TO ARCTIC SEA SALINITY DATA.

Journal Article The annals of applied statistics · June 2024 In this paper we predict sea surface salinity (SSS) in the Arctic Ocean based on satellite measurements. SSS is a crucial indicator for ongoing changes in the Arctic Ocean and can offer important insights about climate change. We particularly focus on area ... Full text Cite

Detecting changes in the transmission rate of a stochastic epidemic model.

Journal Article Statistics in medicine · May 2024 Throughout the course of an epidemic, the rate at which disease spreads varies with behavioral changes, the emergence of new disease variants, and the introduction of mitigation policies. Estimating such changes in transmission rates can help us better mod ... Full text Cite

Spatial meshing for general Bayesian multivariate models.

Journal Article Journal of machine learning research : JMLR · March 2024 Quantifying spatial and/or temporal associations in multivariate geolocated data of different types is achievable via spatial random effects in a Bayesian hierarchical model, but severe computational bottlenecks arise when spatial dependence is encoded as ... Full text Cite

Bayesian inference on high-dimensional multivariate binary responses.

Journal Article Journal of the American Statistical Association · January 2024 It has become increasingly common to collect high-dimensional binary response data; for example, with the emergence of new sampling techniques in ecology. In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences. However, a ... Full text Cite

Ellipsoid fitting with the Cayley transform.

Journal Article IEEE transactions on signal processing : a publication of the IEEE Signal Processing Society · January 2024 We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solu ... Full text Open Access Cite

Nonparametric Bayes multiresolution testing for high-dimensional rare events

Journal Article Journal of Nonparametric Statistics · January 1, 2024 In a variety of application areas, there is interest in assessing evidence of differences in the intensity of event realizations between groups. For example, in cancer genomic studies collecting data on rare variants, the focus is on assessing whether and ... Full text Cite

Emerging Directions in Bayesian Computation

Journal Article Statistical Science · January 1, 2024 Bayesian models are powerful tools for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty through the posterior d ... Full text Cite

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

Conference Proceedings of Machine Learning Research · January 1, 2024 In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metr ... Cite

Hierarchical Shrinkage Gaussian Processes: Applications to Computer Code Emulation and Dynamical System Recovery

Journal Article SIAM Asa Journal on Uncertainty Quantification · January 1, 2024 In many areas of science and engineering, computer simulations are widely used as proxies for physical experiments, which can be infeasible or unethical. Such simulations are often computationally expensive, and an emulator can be trained to efficiently pr ... Full text Cite

Motion-invariant variational autoencoding of brain structural connectomes.

Journal Article Imaging neuroscience (Cambridge, Mass.) · January 2024 Mapping of human brain structural connectomes via diffusion magnetic resonance imaging (dMRI) offers a unique opportunity to understand brain structural connectivity and relate it to various human traits, such as cognition. However, head displacement durin ... Full text Cite

Explaining transmission rate variations and forecasting epidemic spread in multiple regions with a semiparametric mixed effects SIR model.

Journal Article Biometrics · December 2023 The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemi ... Full text Cite

A generalized Bayes framework for probabilistic clustering.

Journal Article Biometrika · September 2023 Loss-based clustering methods, such as k-means clustering and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixtu ... Full text Cite

PiPs: A kernel-based optimization scheme for analyzing non-stationary 1D signals

Journal Article Applied and Computational Harmonic Analysis · September 1, 2023 This paper proposes a novel kernel-based optimization scheme to handle tasks in the analysis, e.g., signal spectral estimation and single-channel source separation of 1D non-stationary oscillatory data. The key insight of our optimization scheme for recons ... Full text Cite

Estimating a brain network predictive of stress and genotype with supervised autoencoders.

Journal Article J R Stat Soc Ser C Appl Stat · August 2023 Targeted brain stimulation has the potential to treat mental illnesses. We develop an approach to help design protocols by identifying relevant multi-region electrical dynamics. Our approach models these dynamics as a superposition of latent networks, wher ... Full text Link to item Cite

PPA: Principal parcellation analysis for brain connectomes and multiple traits.

Journal Article NeuroImage · August 2023 Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome a ... Full text Cite

Bayesian matrix completion for hypothesis testing.

Journal Article Journal of the Royal Statistical Society. Series C, Applied statistics · May 2023 We aim to infer bioactivity of each chemical by assay endpoint combination, addressing sparsity of toxicology data. We propose a Bayesian hierarchical framework which borrows information across different chemicals and assay endpoints, facilitates out-of-sa ... Full text Cite

Identifying vulnerable brain networks associated with Alzheimer's disease risk.

Journal Article Cereb Cortex · April 25, 2023 The selective vulnerability of brain networks in individuals at risk for Alzheimer's disease (AD) may help differentiate pathological from normal aging at asymptomatic stages, allowing the implementation of more effective interventions. We used a sample of ... Full text Open Access Link to item Cite

Bayesian Pyramids: identifiable multilayer discrete latent structure models for discrete data

Journal Article Journal of the Royal Statistical Society Series B Statistical Methodology · April 1, 2023 High-dimensional categorical data are routinely collected in biomedical and social sciences. It is of great importance to build interpretable parsimonious models that perform dimension reduction and uncover meaningful latent structures from such discrete d ... Full text Cite

Escaping The Curse of Dimensionality in Bayesian Model-Based Clustering.

Journal Article Journal of machine learning research : JMLR · April 2023 Bayesian mixture models are widely used for clustering of high-dimensional data with appropriate uncertainty quantification. However, as the dimension of the observations increases, posterior inference often tends to favor too many or too few clusters. Thi ... Full text Cite

Inferring taxonomic placement from DNA barcoding aiding in discovery of new taxa

Journal Article Methods in Ecology and Evolution · February 1, 2023 Predicting the taxonomic affiliation of DNA sequences collected from biological samples is a fundamental step in biodiversity assessment. This task is performed by leveraging existing databases containing reference DNA sequences endowed with a taxonomic id ... Full text Cite

Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data.

Journal Article Journal of machine learning research : JMLR · February 2023 Mixed Membership Models (MMMs) are a popular family of latent structure models for complex multivariate data. Instead of forcing each subject to belong to a single cluster, MMMs incorporate a vector of subject-specific weights characterizing partial member ... Full text Cite

Classification Trees for Imbalanced Data: Surface-to-Volume Regularization

Journal Article Journal of the American Statistical Association · January 1, 2023 Classification algorithms face difficulties when one or more classes have limited training data. We are particularly interested in classification trees, due to their interpretability and flexibility. When data are limited in one or more of the classes, the ... Full text Cite

Bayesian Modeling of Sequential Discoveries.

Journal Article Journal of the American Statistical Association · January 2023 We aim at modeling the appearance of distinct tags in a sequence of labeled objects. Common examples of this type of data include words in a corpus or distinct species in a sample. These sequential discoveries are often summarized via accumulation curves, ... Full text Cite

Mutual information: Measuring nonlinear dependence in longitudinal epidemiological data.

Journal Article PLoS One · 2023 Given a large clinical database of longitudinal patient information including many covariates, it is computationally prohibitive to consider all types of interdependence between patient variables of interest. This challenge motivates the use of mutual info ... Full text Link to item Cite

Covariate-Informed Latent Interaction Models: Addressing Geographic & Taxonomic Bias in Predicting Bird–Plant Interactions

Journal Article Journal of the American Statistical Association · January 1, 2023 Reductions in natural habitats urge that we better understand species’ interconnection and how biological communities respond to environmental changes. However, ecological studies of species’ interactions are limited by their geographic and taxonomic focus ... Full text Cite

Bayesian Inferences on Uncertain Ranks and Orderings: Application to Ranking Players and Lineups

Journal Article Bayesian Analysis · January 1, 2023 It is common to be interested in rankings or order relationships among entities. In complex settings where one does not directly measure a univariate statistic upon which to base ranks, such inferences typically rely on statistical models having entity-spe ... Full text Cite

Posterior Computation with the Gibbs Zig-Zag Sampler

Journal Article Bayesian Analysis · January 1, 2023 An intriguing new class of piecewise deterministic Markov processes (PDMPs) has recently been proposed as an alternative to Markov chain Monte Carlo (MCMC). We propose a new class of PDMPs termed Gibbs zig-zag samplers, which allow parameters to be updated ... Full text Cite

Tree representations of brain structural connectivity via persistent homology.

Journal Article Frontiers in neuroscience · January 2023 The brain structural connectome is generated by a collection of white matter fiber bundles constructed from diffusion weighted MRI (dMRI), acting as highways for neural activity. There has been abundant interest in studying how the structural connectome va ... Full text Cite

Nearest Neighbor Dirichlet Mixtures

Journal Article Journal of Machine Learning Research · January 1, 2023 There is a rich literature on Bayesian methods for density estimation, which characterize the unknown density as a mixture of kernels. Such methods have advantages in terms of providing uncertainty quantification in estimation, while being adaptive to a ri ... Cite

Bayesian Spanning Tree: Estimating the Backbone of the Dependence Graph

Journal Article Journal of Machine Learning Research · January 1, 2023 In multivariate data analysis, it is often important to estimate a graph characterizing dependence among p variables. A popular strategy in Gaussian graphical models and latent Gaussian graphical models uses the non-zero entries in a p × p covariance or pr ... Cite

EXTENDED STOCHASTIC BLOCK MODELS WITH APPLICATION TO CRIMINAL NETWORKS.

Journal Article The annals of applied statistics · December 2022 Reliably learning group structures among nodes in network data is challenging in several applications. We are particularly motivated by studying covert networks that encode relationships among criminals. These data are subject to measurement errors, and ex ... Full text Cite

Generalized infinite factorization models.

Journal Article Biometrika · September 2022 Factorization models express a statistical object of interest in terms of a collection of simpler objects. For example, a matrix or tensor can be expressed as a sum of rank-one components. However, in practice, it can be challenging to infer the relative i ... Full text Cite

BAYESIAN SEMIPARAMETRIC LONG MEMORY MODELS FOR DISCRETIZED EVENT DATA.

Journal Article The annals of applied statistics · September 2022 We introduce a new class of semiparametric latent variable models for long memory discretized event data. The proposed methodology is motivated by a study of bird vocalizations in the Amazon rain forest; the timings of vocalizations exhibit self-similarity ... Full text Cite

Limits of epidemic prediction using SIR models.

Journal Article Journal of mathematical biology · September 2022 The Susceptible-Infectious-Recovered (SIR) equations and their extensions comprise a commonly utilized set of models for understanding and predicting the course of an epidemic. In practice, it is of substantial interest to estimate the model parameters bas ... Full text Open Access Cite

Outlier detection for multi-network data.

Journal Article Bioinformatics (Oxford, England) · August 2022 MotivationIt has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity betwe ... Full text Cite

Predicting phenotypes from brain connection structure

Journal Article Journal of the Royal Statistical Society Series C Applied Statistics · June 1, 2022 This article focuses on the problem of predicting a response variable based on a network-valued predictor. Our motivation is the development of interpretable and accurate predictive models for cognitive traits and neuro-psychiatric disorders based on an in ... Full text Cite

COMPOSITE MIXTURE OF LOG-LINEAR MODELS WITH APPLICATION TO PSYCHIATRIC STUDIES.

Journal Article The annals of applied statistics · June 2022 Psychiatric studies of suicide provide fundamental insights on the evolution of severe psychopathologies, and contribute to the development of early treatment interventions. Our focus is on modelling different traits of psychosis and their interconnections ... Full text Cite

Graph based Gaussian processes on restricted domains

Journal Article Journal of the Royal Statistical Society Series B Statistical Methodology · April 1, 2022 In nonparametric regression, it is common for the inputs to fall in a restricted subset of Euclidean space. Typical kernel-based methods that do not take into account the intrinsic geometry of the domain across which observations are collected may produce ... Full text Cite

Closer than they appear: A Bayesian perspective on individual-level heterogeneity in risk assessment

Journal Article Journal of the Royal Statistical Society Series A Statistics in Society · April 1, 2022 Risk assessment instruments are used across the criminal justice system to estimate the probability of some future event, such as failure to appear for a court appointment or re-arrest. The estimated probabilities are then used in making decisions at the i ... Full text Cite

Correction to: â€˜Approximating posteriors with high-dimensional nuisance parameters via integrated rotated Gaussian approximationâ€™

Journal Article Biometrika · March 1, 2022 In the main paper under subsection -3.2. Bayesian variable selection-, all references to -5.2- should read: -3.1-. Under subsection -5.2. Bayesian variable selection-, the reference to -5.3 and 6- should read: -S5.3 and S6-. These errors have now been corr ... Full text Cite

MULTIVARIATE MIXED MEMBERSHIP MODELING: INFERRING DOMAIN-SPECIFIC RISK PROFILES.

Journal Article The annals of applied statistics · March 2022 Characterizing the shared memberships of individuals in a classification scheme poses severe interpretability issues, even when using a moderate number of classes (say 4). Mixed membership models quantify this phenomenon, but they typically focus on goodne ... Full text Cite

Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods.

Journal Article International journal of environmental research and public health · January 2022 Humans are exposed to a diverse mixture of chemical and non-chemical exposures across their lifetimes. Well-designed epidemiology studies as well as sophisticated exposure science and related technologies enable the investigation of the health impacts of m ... Full text Cite

Spatial Multivariate Trees for Big Data Bayesian Regression.

Journal Article Journal of machine learning research : JMLR · January 2022 High resolution geospatial data are challenging because standard geostatistical models based on Gaussian processes are known to not scale to large data sizes. While progress has been made towards methods that can be computed more efficiently, considerably ... Full text Cite

GAUSSIAN PROCESS SUBSPACE PREDICTION FOR MODEL REDUCTION

Journal Article SIAM Journal on Scientific Computing · January 1, 2022 Subspace-valued functions arise in a wide range of problems, including parametric reduced order modeling (PROM), parameter reduction, and subspace tracking. In PROM, each parameter point can be associated with a subspace, which is used for Petrov–Galerkin ... Full text Cite

Absolute Winding Number Differentiates Mouse Spatial Navigation Strategies With Genetic Risk for Alzheimer's Disease.

Journal Article Front Neurosci · 2022 Spatial navigation and orientation are emerging as promising markers for altered cognition in prodromal Alzheimer's disease, and even in cognitively normal individuals at risk for Alzheimer's disease. The different APOE gene alleles confer various degrees ... Full text Link to item Cite

Corrigendum: Absolute winding number differentiates mouse spatial navigation strategies with genetic risk for Alzheimer's disease.

Journal Article Front Neurosci · 2022 [This corrects the article DOI: 10.3389/fnins.2022.848654.]. ... Full text Link to item Cite

Exponential-Wrapped Distributions on Symmetric Spaces

Journal Article SIAM Journal on Mathematics of Data Science · January 1, 2022 In many applications, the curvature of the space supporting the data makes the statistical modeling challenging. In this paper we discuss the construction and use of probability distributions wrapped around manifolds using exponential maps. These distribut ... Full text Cite

Graph auto-encoding brain networks with applications to analyzing large-scale brain imaging datasets.

Journal Article NeuroImage · December 2021 There has been a huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationships with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes correspo ... Full text Cite

Spectral convergence of graph Laplacian and heat kernel reconstruction in L∞ from random samples

Journal Article Applied and Computational Harmonic Analysis · November 1, 2021 In the manifold setting, we provide a series of spectral convergence results quantifying how the eigenvectors and eigenvalues of the graph Laplacian converge to the eigenfunctions and eigenvalues of the Laplace-Beltrami operator in the L∞ sense. ... Full text Cite

PERTURBED FACTOR ANALYSIS: ACCOUNTING FOR GROUP DIFFERENCES IN EXPOSURE PROFILES.

Journal Article The annals of applied statistics · September 2021 In this article we investigate group differences in phthalate exposure profiles using NHANES data. Phthalates are a family of industrial chemicals used in plastics and as solvents. There is increasing evidence of adverse health effects of exposure to phtha ... Full text Cite

BAYESIAN JOINT MODELING OF CHEMICAL STRUCTURE AND DOSE RESPONSE CURVES.

Journal Article The annals of applied statistics · September 2021 Today there are approximately 85,000 chemicals regulated under the Toxic Substances Control Act, with around 2,000 new chemicals introduced each year. It is impossible to screen all of these chemicals for potential toxic effects, either via full organism < ... Full text Cite

Removing the influence of group variables in high-dimensional predictive modelling.

Journal Article Journal of the Royal Statistical Society. Series A, (Statistics in Society) · July 2021 In many application areas, predictive models are used to support or make important decisions. There is increasing awareness that these models may contain spurious or otherwise undesirable correlations. Such correlations may arise from a variety of sources, ... Full text Cite

Bayesian hierarchical factor regression models to infer cause of death from verbal autopsy data.

Journal Article J R Stat Soc Ser C Appl Stat · June 2021 In low-resource settings where vital registration of death is not routine it is often of critical interest to determine and study the cause of death (COD) for individuals and the cause-specific mortality fraction (CSMF) for populations. Post-mortem autopsi ... Full text Link to item Cite

Approximating posteriors with high-dimensional nuisance parameters via integrated rotated Gaussian approximation.

Journal Article Biometrika · June 2021 Posterior computation for high-dimensional data with many parameters can be challenging. This article focuses on a new method for approximating posterior distributions of a low- to moderate-dimensional parameter in the presence of a high-dimensional or oth ... Full text Cite

Centered Partition Processes: Informative Priors for Clustering (with Discussion).

Journal Article Bayesian analysis · March 2021 There is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions. Most approaches assume exchangeability, leading to simple representations in terms of Exchangeable Partition Probabil ... Full text Cite

Soft tensor regression

Journal Article Journal of Machine Learning Research · January 1, 2021 Statistical methods relating tensor predictors to scalar outcomes in a regression model generally vectorize the tensor predictor and estimate the coefficients of its entries employing some form of regularization, use summaries of the tensor covariate, or u ... Cite

Bayesian Factor Analysis for Inference on Interactions.

Journal Article Journal of the American Statistical Association · January 2021 This article is motivated by the problem of inference on interactions among chemical exposures impacting human health outcomes. Chemicals often co-occur in the environment or in synthetic mixtures and as a result exposure levels can be highly correlated. W ... Full text Cite

Monte Carlo Simulation on the Stiefel Manifold via Polar Expansion

Journal Article Journal of Computational and Graphical Statistics · January 1, 2021 Motivated by applications to Bayesian inference for statistical models with orthogonal matrix parameters, we present (Formula presented.) a general approach to Monte Carlo simulation from probability distributions on the Stiefel manifold. To bypass many of ... Full text Cite

Maximum pairwise bayes factors for covariance structure testing

Journal Article Electronic Journal of Statistics · January 1, 2021 Hypothesis testing of structure in covariance matrices is of sig-nificant importance, but faces great challenges in high-dimensional settings. Although consistent frequentist one-sample covariance tests have been pro-posed, there is a lack of simple, compu ... Full text Cite

Bayesian Distance Clustering.

Journal Article Journal of machine learning research : JMLR · January 2021 Model-based clustering is widely used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on prop ... Cite

Bayesian time-aligned factor analysis of paired multivariate time series.

Journal Article Journal of machine learning research : JMLR · January 2021 Many modern data sets require inference methods that can estimate the shared and individual-specific components of variability in collections of matrices that change over time. Promising methods have been developed to analyze these types of data in static ... Full text Cite

Statistical Guarantees for Transformation Based Models with Applications to Implicit Variational Inference

Conference Proceedings of Machine Learning Research · January 1, 2021 Transformation-based methods have been an attractive approach in non-parametric inference for problems such as unconditional and conditional density estimation due to their unique hierarchical structure that models the data as flexible transformation of a ... Cite

Efficient posterior sampling for high-dimensional imbalanced logistic regression.

Journal Article Biometrika · December 2020 Classification with high-dimensional data is of widespread interest and often involves dealing with imbalanced data. Bayesian classification approaches are hampered by the fact that current Markov chain Monte Carlo algorithms for posterior computation beco ... Full text Cite

Estimating densities with non-linear support by using Fisher-Gaussian kernels.

Journal Article Journal of the Royal Statistical Society. Series B, Statistical methodology · December 2020 Current tools for multivariate density estimation struggle when the density is concentrated near a non-linear subspace or manifold. Most approaches require the choice of a kernel, with the multivariate Gaussian kernel by far the most commonly used. Althoug ... Full text Cite

Nonparametric graphical model for counts.

Journal Article Journal of machine learning research : JMLR · December 2020 Although multivariate count data are routinely collected in many application areas, there is surprisingly little work developing flexible models for characterizing their dependence structure. This is particularly true when interest focuses on inferring the ... Cite

IDENTIFYING MAIN EFFECTS AND INTERACTIONS AMONG EXPOSURES USING GAUSSIAN PROCESSES.

Journal Article The annals of applied statistics · December 2020 This article is motivated by the problem of studying the joint effect of different chemical exposures on human health outcomes. This is essentially a nonparametric regression problem, with interest being focused not on a black box for prediction but instea ... Full text Cite

Bayesian cumulative shrinkage for infinite factorizations.

Journal Article Biometrika · September 2020 The dimension of the parameter space is typically unknown in a variety of models that rely on factorizations. For example, in factor analysis the number of latent factors is not known and has to be inferred from the data. Although classical shrinkage prior ... Full text Cite

Discussions

Journal Article International Statistical Review · August 1, 2020 Full text Cite

Bayesian closed surface fitting through tensor products

Journal Article Journal of Machine Learning Research · July 1, 2020 Closed surfaces provide a useful model for 3-d shapes, with the data typically consisting of a cloud of points in R3. The existing literature on closed surface modeling focuses on frequentist point estimation methods that join surface patches along the edg ... Cite

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods

Journal Article Biometrika · June 1, 2020 Hamiltonian Monte Carlo has emerged as a standard tool for posterior computation. In this article we present an extension that can efficiently explore target distributions with discontinuous densities. Our extension in particular enables efficient sampling ... Full text Cite

Projected t-SNE for batch correction.

Journal Article Bioinformatics (Oxford, England) · June 2020 MotivationLow-dimensional representations of high-dimensional data are routinely employed in biomedical research to visualize, interpret and communicate results from different pipelines. In this article, we propose a novel procedure to directly es ... Full text Cite

Bayesian constraint relaxation.

Journal Article Biometrika · March 2020 Prior information often takes the form of parameter constraints. Bayesian methods include such information through prior distributions having constrained support. By using posterior sampling algorithms, one can quantify uncertainty without relying on asymp ... Full text Cite

The Hastings algorithm at fifty

Journal Article Biometrika · March 1, 2020 In a 1970 Biometrika paper, W. K. Hastings developed a broad class of Markov chain algorithms for sampling from probability distributions that are difficult to sample from directly. The algorithm draws a candidate value from a proposal distribution and acc ... Full text Cite

Computationally efficient joint species distribution modeling of big spatial data.

Journal Article Ecology · February 2020 The ongoing global change and the increased interest in macroecological processes call for the analysis of spatially extensive data on species communities to understand and forecast distributional changes of biodiversity. Recently developed joint species d ... Full text Cite

Comparing and weighting imperfect models using D-probabilities.

Journal Article Journal of the American Statistical Association · January 2020 We propose a new approach for assigning weights to models using a divergence-based method (D-probabilities), relying on evaluating parametric models relative to a nonparametric Bayesian reference using Kullback-Leibler divergence. D-probabilities ar ... Full text Cite

Targeted Random Projection for Prediction From High-Dimensional Features

Journal Article Journal of the American Statistical Association · January 1, 2020 We consider the problem of computationally efficient prediction with high dimensional and highly correlated predictors when accurate variable selection is effectively impossible. Direct application of penalization or Bayesian methods implemented with Marko ... Full text Cite

Random orthogonal matrices and the Cayley transform

Journal Article Bernoulli · January 1, 2020 Random orthogonal matrices play an important role in probability and statistics, arising in multivariate analysis, directional statistics, and models of physical systems, among other areas. Calculations involving random orthogonal matrices are complicated ... Full text Cite

Supervised Autoencoders Learn Robust Joint Factor Models of Neural Activity

Journal Article arXiv preprint arXiv:2004.05209 · 2020 Open Access Cite

Recycling Intermediate Steps to Improve Hamiltonian Monte Carlo

Journal Article Bayesian Analysis · January 1, 2020 Hamiltonian Monte Carlo (HMC) and related algorithms have become routinely used in Bayesian computation. In this article, we present a simple and provably accurate method to improve the efficiency of HMC and related algorithms with essentially no extra com ... Full text Cite

Fiedler regularization: Learning neural networks with graph sparsity

Conference 37th International Conference on Machine Learning Icml 2020 · January 1, 2020 We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network. Existing regularization methods often focus on penalizing weights in a global/uniform manner that ignore ... Cite

Fiedler Regularization: Learning Neural Networks with Graph Sparsity

Conference Proceedings of Machine Learning Research · January 1, 2020 We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network. Existing regularization methods often focus on penalizing weights in a global/uniform manner that ignore ... Cite

Latent Nested Nonparametric Priors (with Discussion).

Journal Article Bayesian analysis · December 2019 Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study t ... Full text Cite

Locally convex kernel mixtures: Bayesian subspace learning

Conference Proceedings 18th IEEE International Conference on Machine Learning and Applications Icmla 2019 · December 1, 2019 Kernel mixture models are routinely used for density estimation. However, in multivariate settings, issues arise in efficiently approximating lower-dimensional structure in the data. For example, it is common to suppose that the density is concentrated nea ... Full text Cite

Latent nested nonparametric priors (with discussion)

Journal Article Bayesian Analysis · December 1, 2019 Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study t ... Full text Cite

The whole-genome landscape of Burkitt lymphoma subtypes.

Journal Article Blood · November 7, 2019 Burkitt lymphoma (BL) is an aggressive, MYC-driven lymphoma comprising 3 distinct clinical subtypes: sporadic BLs that occur worldwide, endemic BLs that occur predominantly in sub-Saharan Africa, and immunodeficiency-associated BLs that occur primarily in ... Full text Link to item Cite

Bayesian sparse linear regression with unknown symmetric error

Journal Article Information and Inference · September 19, 2019 We study Bayesian procedures for sparse linear regression when the unknown error distribution is endowed with a non-parametric prior. Specifically, we put a symmetrized Dirichlet process mixture of Gaussian prior on the error density, where the mixing dist ... Full text Cite

Tensor network factorizations: Relationships between brain structural connectomes and traits.

Journal Article NeuroImage · August 2019 Advanced brain imaging techniques make it possible to measure individuals' structural connectomes in large cohort studies non-invasively. Given the availability of large scale data sets, it is extremely interesting and important to build a set of advanced ... Full text Cite

A comprehensive evaluation of predictive performance of 33 species distribution models at species and community levels

Journal Article Ecological Monographs · August 1, 2019 A large array of species distribution model (SDM) approaches has been developed for explaining and predicting the occurrences of individual species or species assemblages. Given the wealth of existing models, it is unclear which models perform best for int ... Full text Cite

MCMC for Imbalanced Categorical Data

Journal Article Journal of the American Statistical Association · July 3, 2019 Many modern applications collect highly imbalanced categorical data, with some categories relatively rare. Bayesian hierarchical models combat data sparsity by borrowing information, while also quantifying uncertainty. However, posterior computation presen ... Full text Cite

Intrinsic Gaussian processes on complex constrained domains

Journal Article Journal of the Royal Statistical Society Series B Statistical Methodology · July 1, 2019 We propose a class of intrinsic Gaussian processes (GPs) for interpolation, regression and classification on manifolds with a primary focus on complex constrained domains or irregularly shaped spaces arising as subsets or submanifolds of R, R2, ... Full text Cite

Symmetric Bilinear Regression for Signal Subgraph Estimation.

Journal Article IEEE transactions on signal processing : a publication of the IEEE Signal Processing Society · April 2019 There is an increasing interest in learning a set of small outcome-relevant subgraphs in network-predictor regression. The extracted signal subgraphs can greatly improve the interpretation of the association between the network predictor and the response. ... Full text Cite

Robust Bayesian inference via coarsening.

Journal Article Journal of the American Statistical Association · January 2019 The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure. W ... Full text Cite

Nonparametric Bayes Models of Fiber Curves Connecting Brain Regions.

Journal Article Journal of the American Statistical Association · January 2019 In studying structural inter-connections in the human brain, it is common to first estimate fiber bundles connecting different regions relying on diffusion MRI. These fiber bundles act as highways for neural activity. Current statistical methods reduce the ... Full text Cite

On posterior consistency of tail index for Bayesian kernel mixture models

Journal Article Bernoulli · January 1, 2019 Asymptotic theory of tail index estimation has been studied extensively in the frequentist literature on extreme values, but rarely in the Bayesian context. We investigate whether popular Bayesian kernel mixture models are able to support heavy tailed dist ... Full text Cite

Extrinsic Gaussian processes for regression and classification on manifolds

Journal Article Bayesian Analysis · January 1, 2019 Gaussian processes (GPs) are very widely used for modeling of unknown functions or surfaces in applications ranging from regression to classification to spatial processes. Although there is an increasingly vast literature on applications, methods, theory a ... Full text Cite

Identifying Vulnerable Brain Networks in Mouse Models of Genetic Risk Factors for Late Onset Alzheimer's Disease.

Journal Article Front Neuroinform · 2019 The major genetic risk for late onset Alzheimer's disease has been associated with the presence of APOE4 alleles. However, the impact of different APOE alleles on the brain aging trajectory, and how they interact with the brain local environment in a sex s ... Full text Link to item Cite

Report of the Editors—2018

Journal Article Journal of the Royal Statistical Society Series B Statistical Methodology · January 1, 2019 Full text Cite

Convex mixture regression for quantitative risk assessment.

Journal Article Biometrics · December 2018 There is wide interest in studying how the distribution of a continuous response changes with a predictor. We are motivated by environmental applications in which the predictor is the dose of an exposure and the response is a health outcome. A main focus i ... Full text Cite

Bayesian Semiparametric Mixed Effects Markov Models With Application to Vocalization Syntax

Journal Article Journal of the American Statistical Association · October 2, 2018 Studying the neurological, genetic, and evolutionary basis of human vocal communication mechanisms using animal vocalization models is an important field of neuroscience. The datasets typically comprise structured sequences of syllables or “songs” produced ... Full text Cite

Scaling up data augmentation MCMC via calibration

Journal Article Journal of Machine Learning Research · October 1, 2018 There has been considerable interest in making Bayesian inference more scalable. In big data settings, most of the focus has been on reducing the computing time per iteration rather than reducing the number of iterations needed in Markov chain Monte Carlo ... Cite

Scalable Bayes via barycenter in Wasserstein space

Journal Article Journal of Machine Learning Research · August 1, 2018 Divide-and-conquer based methods for Bayesian inference provide a general approach for tractable posterior inference when the sample size is large. These methods divide the data into smaller subsets, sample from the posterior distribution of parameters in ... Cite

Extrema-weighted feature extraction for functional data.

Journal Article Bioinformatics · July 15, 2018 MOTIVATION: Although there is a rich literature on methods for assessing the impact of functional predictors, the focus has been on approaches for dimension reduction that do not suit certain applications. Examples of standard approaches include functional ... Full text Link to item Cite

Bayesian Conditional Density Filtering

Journal Article Journal of Computational and Graphical Statistics · July 3, 2018 We propose a conditional density filtering (C-DF) algorithm for efficient online Bayesian inference. C-DF adapts MCMC sampling to the online setting, sampling from approximations to conditional posterior distributions obtained by propagating surrogate cond ... Full text Cite

Bayesian Multi-Plate High-Throughput Screening of Compounds.

Journal Article Sci Rep · June 22, 2018 High-throughput screening of compounds (chemicals) is an essential part of drug discovery, involving thousands to millions of compounds, with the purpose of identifying candidate hits. Most statistical tools, including the industry standard B-score method, ... Full text Link to item Cite

Theoretical limits of microclustering for record linkage.

Journal Article Biometrika · June 2018 There has been substantial recent interest in record linkage, where one attempts to group the records pertaining to the same entities from one or more large databases that lack unique identifiers. This can be viewed as a type of microclustering, with few o ... Full text Cite

Mapping population-based structural connectomes.

Journal Article NeuroImage · May 2018 Advances in understanding the structural connectomes of human brain require improved approaches for the construction, comparison and integration of high-dimensional whole-brain tractography data from a large number of individuals. This article develops a p ... Full text Cite

Statistics in the big data era: Failures of the machine

Journal Article Statistics and Probability Letters · May 1, 2018 There is vast interest in automated methods for complex data analysis. However, there is a lack of consideration of (1) interpretability, (2) uncertainty quantification, (3) applications with limited training data, and (4) selection bias. Statistical metho ... Full text Cite

Effect of A1C and Glucose on Postoperative Mortality in Noncardiac and Cardiac Surgeries.

Conference Diabetes Care · April 2018 OBJECTIVE: Hemoglobin A1c (A1C) is used in assessment of patients for elective surgeries because hyperglycemia increases risk of adverse events. However, the interplay of A1C, glucose, and surgical outcomes remains unclarified, with often only two of these ... Full text Link to item Cite

Bayesian inference and testing of group differences in brain networks

Journal Article Bayesian Analysis · January 1, 2018 Network data are increasingly collected along with other variables of interest. Our motivation is drawn from neurophysiology studies measuring brain connectivity networks for a sample of individuals along with their membership to a low or high creative rea ... Full text Cite

Active learning of cortical connectivity from two-photon imaging data.

Journal Article PloS one · January 2018 Understanding how groups of neurons interact within a network is a fundamental question in system neuroscience. Instead of passively observing the ongoing activity of a network, we can typically perturb its activity, either by external sensory stimulation ... Full text Cite

Fast Moment Estimation for Generalized Latent Dirichlet Models.

Journal Article Journal of the American Statistical Association · January 2018 We develop a generalized method of moments (GMM) approach for fast parameter estimation in a new class of Dirichlet latent variable models with mixed data types. Parameter estimation via GMM has computational and statistical advantages over alternative met ... Full text Cite

Supplementary Material For “Bayesian Inference And Testing Of Group Differences In Brain Networks”

Journal Article Bayesian Analysis · January 1, 2018 The supplementary materials contain proofs of Propositions 1, 2 and 3, providing the-oretical support for the methodology developed in the article “Bayesian Inference and Testing of Group Differences in Brain Networks ... Full text Cite

Report of the Editors—2017

Journal Article Journal of the Royal Statistical Society Series B Statistical Methodology · January 1, 2018 Full text Cite

Robust and scalable bayes via a median of subset posterior measures

Journal Article Journal of Machine Learning Research · December 1, 2017 We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the poste ... Cite

Bayesian local extremum splines

Journal Article Biometrika · December 1, 2017 We consider shape-restricted nonparametric regression on a closed set $$\mathcal{X} \subset \mathbb{R},$$ where it is reasonable to assume that the function has no more than $$H$$ local extrema interior to $$\mathcal{X}$$. Following a Bayesian approach we ... Full text Link to item Cite

Bayesian local extremum splines

Journal Article Biometrika · December 1, 2017 Cite

Exploiting big data in logistics risk assessment via Bayesian nonparametrics

Journal Article Operations Research · November 1, 2017 In cargo logistics, a key performance measure is transport risk, defined as the deviation of the actual arrival time from the planned arrival time. Neither earliness nor tardiness is desirable for customer and freight forwarders. In this paper, we investig ... Full text Cite

Genetic and Functional Drivers of Diffuse Large B Cell Lymphoma.

Journal Article Cell · October 5, 2017 Diffuse large B cell lymphoma (DLBCL) is the most common form of blood cancer and is characterized by a striking degree of genetic and clinical heterogeneity. This heterogeneity poses a major barrier to understanding the genetic basis of the disease and it ... Full text Link to item Cite

Nonparametric Bayes Modeling of Populations of Networks

Journal Article Journal of the American Statistical Association · October 2, 2017 Replicated network data are increasingly available in many research fields. For example, in connectomic applications, interconnections among brain regions are collected for each patient under study, motivating statistical models which can flexibly characte ... Full text Open Access Cite

Rejoinder: Nonparametric Bayes Modeling of Populations of Networks

Journal Article Journal of the American Statistical Association · October 2, 2017 Full text Cite

Bayesian genome- and epigenome-wide association studies with gene level dependence.

Journal Article Biometrics · September 2017 High-throughput genetic and epigenetic data are often screened for associations with an observed phenotype. For example, one may wish to test hundreds of thousands of genetic variants, or DNA methylation sites, for an association with disease status. These ... Full text Cite

Simple, scalable and accurate posterior interval estimation

Journal Article Biometrika · September 1, 2017 Standard posterior sampling algorithms, such as Markov chain Monte Carlo procedures, face major challenges in scaling up to massive datasets. We propose a simple and general posterior interval estimation algorithm to rapidly and accurately estimate quantil ... Full text Cite

Expandable factor analysis.

Journal Article Biometrika · September 2017 Bayesian sparse factor models have proven useful for characterizing dependence in multivariate data, but scaling computation to large numbers of samples and dimensions is problematic. We propose expandable factor analysis for scalable inference in factor m ... Full text Cite

Bayesian tensor regression

Journal Article Journal of Machine Learning Research · August 1, 2017 We propose a Bayesian approach to regression with a scalar response on vector and tensor covariates. Vectorization of the tensor prior to analysis fails to exploit the structure, often leading to poor estimation and predictive performance. We introduce a n ... Cite

Bayesian network-response regression.

Journal Article Bioinformatics (Oxford, England) · June 2017 MotivationThere is increasing interest in learning how human brain networks vary as a function of a continuous trait, but flexible and efficient procedures to accomplish this goal are limited. We develop a Bayesian semiparametric model, which comb ... Full text Open Access Cite

Rat intersubjective decisions are encoded by frequency-specific oscillatory contexts.

Journal Article Brain Behav · June 2017 INTRODUCTION: It is unknown how the brain coordinates decisions to withstand personal costs in order to prevent other individuals' distress. Here we test whether local field potential (LFP) oscillations between brain regions create "neural contexts" that s ... Full text Open Access Link to item Cite

Bayesian functional data modeling for heterogeneous volatility

Journal Article Bayesian Analysis · June 1, 2017 Although there are many methods for functional data analysis, less emphasis is put on characterizing variability among volatilities of individual functions. In particular, certain individuals exhibit erratic swings in their trajectory while other individua ... Full text Cite

Bayesian inference for Matérn repulsive processes

Journal Article Journal of the Royal Statistical Society Series B Statistical Methodology · June 1, 2017 In many applications involving point pattern data, the Poisson process assumption is unrealistic, with the data exhibiting a more regular spread. Such repulsion between events is exhibited by trees for example, because of competition for light and nutrient ... Full text Cite

Enteropathy-associated T cell lymphoma subtypes are characterized by loss of function of SETD2.

Journal Article J Exp Med · May 1, 2017 Enteropathy-associated T cell lymphoma (EATL) is a lethal, and the most common, neoplastic complication of celiac disease. Here, we defined the genetic landscape of EATL through whole-exome sequencing of 69 EATL tumors. SETD2 was the most frequently silenc ... Full text Link to item Cite

How are species interactions structured in species-rich communities? A new method for analysing time-series data.

Journal Article Proceedings. Biological sciences · May 2017 Estimation of intra- and interspecific interactions from time-series on species-rich communities is challenging due to the high number of potentially interacting species pairs. The previously proposed sparse interactions model overcomes this challenge by a ... Full text Cite

How to make more out of community data? A conceptual framework and its implementation as models and software.

Journal Article Ecology letters · May 2017 Community ecology aims to understand what factors determine the assembly and dynamics of species assemblages at different spatiotemporal scales. To facilitate the integration between conceptual and statistical approaches in community ecology, we propose Hi ... Full text Cite

Bayesian modelling of networks in complex business intelligence problems

Journal Article Journal of the Royal Statistical Society Series C Applied Statistics · April 1, 2017 Complex network data problems are increasingly common in many fields of application. Our motivation is drawn from strategic marketing studies monitoring customer choices of specific products, along with co-subscription networks encoding multiple-purchasing ... Full text Cite

The Genetic Basis of Hepatosplenic T-cell Lymphoma.

Journal Article Cancer Discov · April 2017 Hepatosplenic T-cell lymphoma (HSTL) is a rare and lethal lymphoma; the genetic drivers of this disease are unknown. Through whole-exome sequencing of 68 HSTLs, we define recurrently mutated driver genes and copy-number alterations in the disease. Chromati ... Full text Link to item Cite

Bayesian nonparametric inference on the stiefel manifold

Journal Article Statistica Sinica · April 1, 2017 The Stiefel manifold Vp,d is the space of all d × p orthonormal matrices, with the d-1 hypersphere and the space of all orthogonal matrices constituting special cases. In modeling data lying on the Stiefel manifold, parametric distributions such as the mat ... Full text Cite

Using joint species distribution models for evaluating how species-to-species associations depend on the environmental context

Journal Article Methods in Ecology and Evolution · April 1, 2017 Joint species distribution models (JSDM) are increasingly used to analyse community ecology data. Recent progress with JSDMs has provided ecologists with new tools for estimating species associations (residual co-occurrence patterns after accounting for en ... Full text Cite

Toward automated prior choice

Journal Article Statistical Science · February 1, 2017 Full text Cite

Wood-inhabiting fungi with tight associations with other species have declined as a response to forest management

Journal Article Oikos · February 1, 2017 Research on mutualistic and antagonistic networks, such as plant–pollinator and host–parasite networks, has shown that species interactions can influence and be influenced by the responses of species to environmental perturbations. Here we examine whether ... Full text Cite

TENSOR DECOMPOSITIONS AND SPARSE LOG-LINEAR MODELS.

Journal Article Annals of statistics · January 2017 Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categoric ... Full text Cite

Extrinsic local regression on manifold-valued data.

Journal Article Journal of the American Statistical Association · January 2017 We propose an extrinsic regression framework for modeling data with manifold valued responses and Euclidean predictors. Regression with manifold responses has wide applications in shape analysis, neuroscience, medical imaging and many other areas. Our appr ... Full text Cite

Report of the editors-2016

Journal Article Journal of the Royal Statistical Society Series B Statistical Methodology · January 1, 2017 Full text Cite

Sub-optimality of some continuous shrinkage priors

Journal Article Stochastic Processes and their Applications · December 1, 2016 Two-component mixture priors provide a traditional way to induce sparsity in high-dimensional Bayes models. However, several aspects of such a prior, including computational complexities in high-dimensions, interpretation of exact zeros and non-sparse post ... Full text Cite

Locally adaptive dynamic networks

Journal Article Annals of Applied Statistics · December 1, 2016 Our focus is on realistically modeling and forecasting dynamic networks of face-to-face contacts among individuals. Important aspects of such data that lead to problems with current methods include the tendency of the contacts to move between periods of sl ... Full text Cite

Bayesian inference on quasi-sparse count data.

Journal Article Biometrika · December 2016 There is growing interest in analysing high-dimensional count data, which often exhibit quasi-sparsity corresponding to an overabundance of zeros and small nonzero counts. Existing methods for analysing multivariate count data via Poisson or negative binom ... Full text Cite

Bayesian graphical models for multivariate functional data

Journal Article Journal of Machine Learning Research · October 1, 2016 Graphical models express conditional independence relationships among variables. Although methods for vector-valued data are well established, functional data graphical models remain underdeveloped. By functional data, we refer to data that are realization ... Cite

Bayesian Nonparametric Modeling of Higher Order Markov Chains

Journal Article Journal of the American Statistical Association · October 1, 2016 We consider the problem of flexible modeling of higher order Markov chains when an upper bound on the order of the chain is known but the true order and nature of the serial dependence are unknown. We propose Bayesian nonparametric methodology based on con ... Full text Cite

Removing cradle artifacts in X-ray images of paintings

Journal Article SIAM Journal on Imaging Sciences · August 30, 2016 We propose an algorithm that removes the visually unpleasant effects of cradling in X-ray images of panel paintings, with the goal of improving the X-ray image readability by art experts. The algorithm consists of three stages. In the first stage the locat ... Full text Cite

Personalised estimation of a woman's most fertile days.

Journal Article The European journal of contraception & reproductive health care : the official journal of the European Society of Contraception · August 2016 ObjectivesWe propose a new, personalised approach of estimating a woman's most fertile days that only requires recording the first day of menses and can use a smartphone to convey this information to the user so that she can plan or prevent pregna ... Full text Cite

Dysregulation of Prefrontal Cortex-Mediated Slow-Evolving Limbic Dynamics Drives Stress-Induced Emotional Pathology.

Journal Article Neuron · July 20, 2016 Circuits distributed across cortico-limbic brain regions compose the networks that mediate emotional behavior. The prefrontal cortex (PFC) regulates ultraslow (<1 Hz) dynamics across these networks, and PFC dysfunction is implicated in stress-related illne ... Full text Link to item Cite

Multiscale bernstein polynomials for densities

Journal Article Statistica Sinica · July 1, 2016 Our focus is on constructing a multiscale nonparametric prior for densities. The Bayes density estimation literature is dominated by single scale methods, with the exception of Polya trees, which favor overly-spiky densities even when the truth is smooth. ... Full text Cite

Nonparametric Bayes modeling with sample survey weights.

Journal Article Statistics & probability letters · June 2016 In population studies, it is standard to sample data via designs in which the population is divided into strata, with the different strata assigned different probabilities of inclusion. Although there have been some proposals for including sample survey we ... Full text Cite

Data augmentation for models based on rejection sampling.

Journal Article Biometrika · June 2016 We present a data augmentation scheme to perform Markov chain Monte Carlo inference for models where data generation involves a rejection sampling algorithm. Our idea is a simple scheme to instantiate the rejected proposals preceding each data point. The r ... Full text Open Access Cite

Compressed Gaussian process for manifold regression

Journal Article Journal of Machine Learning Research · May 1, 2016 Nonparametric regression for large numbers of features (p) is an increasingly important problem. If the sample size n is massive, a common strategy is to partition the feature space, and then separately apply simple models to each partition set. This is no ... Cite

Using latent variable models to identify large networks of species-to-species associations at different spatial scales

Journal Article Methods in Ecology and Evolution · May 1, 2016 We present a hierarchical latent variable model that partitions variation in species occurrences and co-occurrences simultaneously at multiple spatial scales. We illustrate how the parameterized model can be used to predict the occurrences of a species by ... Full text Cite

Online Variational Bayes Inference for High-Dimensional Correlated Data

Journal Article Journal of Computational and Graphical Statistics · April 2, 2016 High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. The analysis of such data poses many computational challenges, especially when the observations are correlated over time and/or across space. In ... Full text Cite

Bayesian manifold regression

Journal Article Annals of Statistics · April 1, 2016 There is increasing interest in the problem of nonparametric regression with high-dimensional predictors. When the number of predictors D is large, one encounters a daunting problem in attempting to estimate aD-dimensional surface based on limited data. Fo ... Full text Cite

Nonparametric Bayes modeling for case control studies with many predictors.

Journal Article Biometrics · March 2016 It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent s ... Full text Cite

Subspace segmentation by dense block and sparse representation.

Journal Article Neural networks : the official journal of the International Neural Network Society · March 2016 Subspace segmentation is a fundamental topic in computer vision and machine learning. However, the success of many popular methods is about independent subspace segmentation instead of the more flexible and realistic disjoint subspace segmentation. Focusin ... Full text Cite

Bayesian Conditional Tensor Factorizations for High-Dimensional Classification.

Journal Article Journal of the American Statistical Association · January 2016 In many application areas, data are collected on a categorical response and high-dimensional categorical predictors, with the goals being to build a parsimonious model for classification while doing inferences on the important predictors. In settings such ... Full text Cite

A Foxp2 Mutation Implicated in Human Speech Deficits Alters Sequencing of Ultrasonic Vocalizations in Adult Male Mice.

Journal Article Front Behav Neurosci · 2016 Development of proficient spoken language skills is disrupted by mutations of the FOXP2 transcription factor. A heterozygous missense mutation in the KE family causes speech apraxia, involving difficulty producing words with complex learned sequences of sy ... Full text Open Access Link to item Cite

No penalty no tears: Least squares in high-dimensional linear models

Conference 33rd International Conference on Machine Learning Icml 2016 · January 1, 2016 Ordinary least squares (OI,S) is the default method for fitting linear models, but is not applicable for problems with dimensionality larger than the sample size. For these problems, we advocate the use of a generalized version of OLS motivated by ridge re ... Cite

DECOrrelated feature space partitioning for distributed sparse regression

Conference Advances in Neural Information Processing Systems · January 1, 2016 Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed a ... Cite

Scalable geometric density estimation

Conference Proceedings of the 19th International Conference on Artificial Intelligence and Statistics Aistats 2016 · January 1, 2016 It is standard to assume a low-dimensional structure in estimating a high-dimensional density. However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we t ... Cite

Variational Gaussian copula inference

Conference Proceedings of the 19th International Conference on Artificial Intelligence and Statistics Aistats 2016 · January 1, 2016 We utilize copulas to constitute a unified framework for constructing and optimizing variational proposals in hierarchical Bayesian models. For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational ... Cite

A hybrid bayesian approach for genome-wide association studies on related individuals.

Journal Article Bioinformatics (Oxford, England) · December 2015 MotivationBoth single marker and simultaneous analysis face challenges in GWAS due to the large number of markers genotyped for a small number of subjects. This large p small n problem is particularly challenging when the trait under investigation ... Full text Cite

Shared kernel Bayesian screening.

Journal Article Biometrika · December 2015 This article concerns testing for equality of distribution between groups. We focus on screening variables with shared distributional features such as common support, modes and patterns of skewness. We propose a Bayesian testing method using kernel mixture ... Full text Cite

Dirichlet-Laplace priors for optimal shrinkage.

Journal Article Journal of the American Statistical Association · December 2015 Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routin ... Full text Cite

Bayesian nonparametric covariance regression

Journal Article Journal of Machine Learning Research · December 1, 2015 Capturing predictor-dependent correlations amongst the elements of a multivariate response vector is fundamental to numerous applied domains, including neuroscience, epidemiology, and finance. Although there is a rich literature on methods for allowing the ... Cite

Bayesian Compressed Regression

Journal Article Journal of the American Statistical Association · October 2, 2015 As an alternative to variable selection or shrinkage in high-dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the predictors can ... Full text Cite

Uncovering systematic bias in ratings across categories: A Bayesian approach

Conference Recsys 2015 Proceedings of the 9th ACM Conference on Recommender Systems · September 16, 2015 Recommender systems are routinely equipped with standardized taxonomy that associates each item with one or more categories or genres. Although such information does not directly imply the quality of an item, the distribution of ratings vary greatly across ... Full text Cite

Optimal approximating Markov chains for Bayesian inference

Journal Article · August 13, 2015 The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It is common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to ... Open Access Link to item Cite

Semiparametric Bayes local additive models for longitudinal data.

Journal Article Statistics in biosciences · May 2015 In longitudinal data analysis, there is great interest in assessing the impact of predictors on the time-varying trajectory in a response variable. In such settings, an important issue is to account for heterogeneity in the shape of the trajectory among su ... Full text Cite

Benchmark pregnancy rates and the assessment of post-coital contraceptives: an update.

Journal Article Contraception · April 2015 ObjectiveIn 2001, we provided benchmark estimates of probability of pregnancy given a single act of intercourse. Those calculations assumed that intercourse and ovulation are independent. Subsequent research has shown that this assumption is not v ... Full text Cite

Male mice song syntax depends on social contexts and influences female preferences

Journal Article FRONTIERS IN BEHAVIORAL NEUROSCIENCE · April 1, 2015 Full text Open Access Link to item Cite

Erratum: Finite sample posterior concentration in high-dimensional regression (Information and Inference (2015) 3 (103-133) DOI: 10.1093/imaiai/iau003)

Journal Article Information and Inference · March 1, 2015 Artin Armagan's and Rayan Saab's affiliations were switched in the published version of this article. Artin Armagan's affiliation should be: SAS Institute, Inc., Raleigh, NC, USA; Rayan Saab's affiliation should be: Department of Mathematics, University of ... Full text Cite

Joint eQTL assessment of whole blood and dura mater tissue from individuals with Chiari type I malformation.

Journal Article BMC Genomics · January 22, 2015 BACKGROUND: Expression quantitative trait loci (eQTL) play an important role in the regulation of gene expression. Gene expression levels and eQTLs are expected to vary from tissue to tissue, and therefore multi-tissue analyses are necessary to fully under ... Full text Open Access Link to item Cite

Marginally specified priors for non-parametric Bayesian estimation.

Journal Article Journal of the Royal Statistical Society. Series B, Statistical methodology · January 2015 Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a para ... Full text Cite

Bayesian multivariate mixed-scale density estimation

Journal Article Statistics and Its Interface · January 1, 2015 Although continuous density estimation has received abundant attention in the Bayesian nonparametrics literature, there is limited theory on multivariate mixed scale density estimation. In this note, we consider a general framework to jointly model continu ... Full text Cite

Male mice song syntax depends on social contexts and influences female preferences.

Journal Article Front Behav Neurosci · 2015 In 2005, Holy and Guo advanced the idea that male mice produce ultrasonic vocalizations (USV) with some features similar to courtship songs of songbirds. Since then, studies showed that male mice emit USV songs in different contexts (sexual and other) and ... Full text Open Access Link to item Cite

WASP: Scalable Bayes via barycenters of subset posteriors

Conference Journal of Machine Learning Research · January 1, 2015 The promise of Bayesian methods for big data sets has not fully been realized due to the lack of scalable computational algorithms. For massive data, it is necessary to store and process subsets on different machines in a distributed manner. We propose a s ... Cite

Bayesian factorizations of big sparse tensors.

Journal Article Journal of the American Statistical Association · January 2015 It has become routine to collect data that are structured as multiway arrays (tensors). There is an enormous literature on low rank and sparse matrix factorizations, but limited consideration of extensions to the tensor case in statistics. The most common ... Full text Cite

Nonparametric Bayes inference on conditional independence

Journal Article Biometrika · January 1, 2015 In many application areas, a primary focus is on assessing evidence in the data refuting the assumption of independence of Y and X conditionally on Z, with Y response variables, X predictors of interest, and Z covariates. Ideally, one would have methods av ... Full text Cite

Quantifying uncertainty in variable selection with arbitrary matrices

Conference 2015 IEEE 6th International Workshop on Computational Advances in Multi Sensor Adaptive Processing Camsap 2015 · January 1, 2015 Probabilistically quantifying uncertainty in parameters, predictions and decisions is a crucial component of broad scientific and engineering applications. This is however difficult if the number of parameters far exceeds the sample size. Although there ar ... Full text Cite

On the consistency theory of high dimensional variable screening

Conference Advances in Neural Information Processing Systems · January 1, 2015 Variable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to produce the final ... Cite

Parallelizing MCMC with random partition trees

Conference Advances in Neural Information Processing Systems · January 1, 2015 The modern scale of data has brought new challenges to Bayesian inference. In particular, conventional MCMC algorithms are computationally very expensive for large data sets. A promising approach to solve this problem is embarrassingly parallel MCMC (EP-MC ... Cite

Probabilistic curve learning: Coulomb repulsion and the electrostatic Gaussian process

Conference Advances in Neural Information Processing Systems · January 1, 2015 Learning of low dimensional structure in multidimensional data is a canonical problem in machine learning. One common approach is to suppose that the observed data are close to a lower-dimensional smooth manifold. There are a rich variety of manifold learn ... Cite

Nonparametric Bayes dynamic modelling of relational data

Journal Article Biometrika · December 1, 2014 Symmetric binary matrices representing relations are collected in many areas. Our focus is on dynamically evolving binary relational matrices, with interest being on inference on the relationship structure and prediction. We propose a nonparametric Bayesia ... Full text Cite

Adaptive sampling for Bayesian geospatial models

Journal Article Statistics and Computing · November 1, 2014 Bayesian hierarchical modeling with Gaussian process random effects provides a popular approach for analyzing point-referenced spatial data. For large spatial data sets, however, generic posterior sampling is infeasible due to the extremely high computatio ... Full text Cite

Bayesian Multiscale Modeling of Closed Curves in Point Clouds.

Journal Article Journal of the American Statistical Association · October 2014 Modeling object boundaries based on image or point cloud data is frequently necessary in medical and scientific applications ranging from detecting tumor contours for targeted radiation therapy, to the classification of organisms based on their structural ... Full text Cite

Functional clustering in nested designs: Modeling variability in reproductive epidemiology studies

Journal Article Annals of Applied Statistics · September 1, 2014 We discuss functional clustering procedures for nested designs, where multiple curves are collected for each subject in the study. We start by considering the application of standard functional clustering tools to this problem, which leads to groupings bas ... Full text Cite

Comment

Journal Article Journal of the American Statistical Association · July 3, 2014 Full text Cite

Mechanistic Hierarchical Gaussian Processes.

Journal Article Journal of the American Statistical Association · July 2014 The statistics literature on functional data analysis focuses primarily on flexible black-box approaches, which are designed to allow individual curves to have essentially any shape while characterizing variability. Such methods typically cannot incorporat ... Full text Cite

Learning phenotype densities conditional on many interacting predictors.

Journal Article Bioinformatics (Oxford, England) · June 2014 MotivationEstimating a phenotype distribution conditional on a set of discrete-valued predictors is a commonly encountered task. For example, interest may be in how the density of a quantitative trait varies with single nucleotide polymorphisms an ... Full text Cite

Finite sample posterior concentration in high-dimensional regression

Journal Article Information and Inference · June 1, 2014 We study the behavior of the posterior distribution in high-dimensional Bayesian Gaussian linear regression models having p ≫ n, where p is the number of predictors and n is the sample size. Our focus is on obtaining quantitative finite sample bounds ensur ... Full text Cite

The genomic landscape of mantle cell lymphoma is related to the epigenetically determined chromatin state of normal B cells.

Journal Article Blood · May 8, 2014 In this study, we define the genetic landscape of mantle cell lymphoma (MCL) through exome sequencing of 56 cases of MCL. We identified recurrent mutations in ATM, CCND1, MLL2, and TP53. We further identified a number of novel genes recurrently mutated in ... Full text Link to item Cite

Bayes variable selection in semiparametric linear models.

Journal Article Journal of the American Statistical Association · March 2014 There is a rich literature on Bayesian variable selection for parametric models. Our focus is on generalizing methods and asymptotic theory established for mixtures of g-priors to semiparametric linear regression models having unknown residual densi ... Full text Cite

Generalized Dynamic Factor Models for Mixed-Measurement Time Series.

Journal Article Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America · February 2014 In this article, we propose generalized Bayesian dynamic factor models for jointly modeling mixed-measurement time series. The framework allows mixed-scale measurements associated with each time series, with different measurements having different distribu ... Full text Cite

Bayesian nonparametric regression with varying residual density.

Journal Article Annals of the Institute of Statistical Mathematics · February 2014 We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mi ... Full text Cite

Digital cradle removal in X-ray images of art paintings

Conference 2014 IEEE International Conference on Image Processing Icip 2014 · January 28, 2014 We introduce an algorithm that removes the deleterious effect of cradling on X-ray images of paintings on wooden panels. The algorithm consists of a three stage procedure. Firstly, the cradled regions are located automatically. The second step consists of ... Full text Cite

Special issue on Bayesian computing, methods and applications

Journal Article Computational Statistics and Data Analysis · January 1, 2014 Full text Cite

Bayesian modeling of temporal properties of infectious disease in a college student population

Journal Article Journal of Applied Statistics · January 1, 2014 A Bayesian statistical model is developed for analysis of the time-evolving properties of infectious disease, with a particular focus on viruses. The model employs a latent semi-Markovian state process, and the state-transition statistics are driven by thr ... Full text Cite

Improving prediction from dirichlet process mixtures via enrichment

Journal Article Journal of Machine Learning Research · January 1, 2014 Flexible covariate-dependent density estimation can be achieved by modelling the joint density of the response and covariates as a Dirichlet process mixture. An appealing aspect of this approach is that computations are relatively easy. In this paper, we e ... Cite

Bayesian monotone regression using Gaussian process projection

Journal Article Biometrika · January 1, 2014 Shape-constrained regression analysis has applications in dose-response modelling, environmental risk assessment, disease screening and many other areas. Incorporating the shape constraints can improve estimation efficiency and avoid implausible results. W ... Full text Cite

Locally adaptive factor processes for multivariate time series

Journal Article Journal of Machine Learning Research · January 1, 2014 In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such time-varyin ... Cite

Enriched Stick Breaking Processes for Functional Data.

Journal Article Journal of the American Statistical Association · January 2014 In many applications involving functional data, prior information is available about the proportion of curves having different attributes. It is not straightforward to include such information in existing procedures for functional data analysis. Generalizi ... Full text Cite

Bayesian dynamic financial networks with time-varying predictors

Journal Article Statistics and Probability Letters · January 1, 2014 We propose a targeted and robust modeling of dependence in multivariate time series via dynamic networks, with time-varying predictors included to improve interpretation and prediction. The model is applied to financial markets, estimating effects of verba ... Full text Cite

Posterior contraction in sparse bayesian factor models for massive covariance matrices

Journal Article Annals of Statistics · January 1, 2014 Sparse Bayesian factor models are routinely implemented for parsimonious dependence modeling and dimensionality reduction in highdimensional applications. We provide theoretical understanding of such Bayesian procedures in terms of posterior convergence ra ... Full text Cite

Semiconvex regression for metamodeling-based optimization

Journal Article SIAM Journal on Optimization · January 1, 2014 Stochastic search involves finding a set of controllable parameters that minimizes an unknown objective function using a set of noisy observations. We consider the case when the unknown function is convex and a metamodel is used as a surrogate objective fu ... Full text Cite

Latent factor models for density estimation

Journal Article Biometrika · January 1, 2014 Although discrete mixture modelling has formed the backbone of the literature on Bayesian density estimation, there are some well-known disadvantages. As an alternative to discrete mixtures, we propose a class of priors based on random nonlinear functions ... Full text Cite

Scalable bayesian low-rank decomposition of incomplete multiway tensors

Conference 31st International Conference on Machine Learning Icml 2014 · January 1, 2014 We present a scalable Bayesian framework for low-rank decomposition of multiway tensor data with missing observations. The key issue of pre-specifying the rank of the decomposition is sidestepped in a principled manner using a multiplicative gamma process ... Cite

Scalable and robust Bayesian inference via the median posterior

Conference 31st International Conference on Machine Learning Icml 2014 · January 1, 2014 Many Bayesian learning methods for massive data benefit from working with small subsets of observations. In particular, significant progress has been made in scalable Bayesian learning via stochastic approximation. However, Bayesian learning methods in dis ... Cite

Median selection subset aggregation for parallel inference

Conference Advances in Neural Information Processing Systems · January 1, 2014 For massive data sets, efficient computation commonly relies on distributed algorithms that store and process subsets of the data on different machines, minimizing communication costs. Our focus is on regression and classification problems involving many f ... Cite

Bayesian logistic Gaussian process models for dynamic networks

Conference Journal of Machine Learning Research · January 1, 2014 Time-varying adjacency matrices encoding the presence or absence of a relation among entities are available in many research fields. Motivated by an application to studying dynamic networks among sports teams, we propose a Bayesian nonparametric model. The ... Cite

Anisotropic function estimation using multi-bandwidth Gaussian processes

Journal Article Annals of Statistics · January 1, 2014 In nonparametric regression problems involving multiple predictors, there is typically interest in estimating an anisotropic multivariate regression surface in the important predictors while discarding the unimportant ones. Our focus is on defining a Bayes ... Full text Cite

Nonparametric Bayes

Chapter · January 1, 2014 I reflect on the past, present, and future of nonparametric Bayesian statistics. Current nonparametric Bayes research tends to be split between theoretical studies, seeking to understand relatively simple models, and machine learning, defining new models a ... Cite

Bayesian crack detection in ultra high resolution multimodal images of paintings

Journal Article 2013 18th International Conference on Digital Signal Processing DSP 2013 · December 6, 2013 The preservation of our cultural heritage is of paramount importance. Thanks to recent developments in digital acquisition techniques, powerful image analysis algorithms are developed which can be useful non-invasive tools to assist in the restoration and ... Full text Open Access Cite

Nonparametric Bayes modelling of count processes

Journal Article Biometrika · December 1, 2013 Data on count processes arise in a variety of applications, including longitudinal, spatial and imaging studies measuring count responses. The literature on statistical models for dependent count data is dominated by models built from hierarchical Poisson ... Full text Cite

Posterior consistency in linear models under shrinkage priors

Journal Article Biometrika · December 1, 2013 We investigate the asymptotic behaviour of posterior distributions of regression coefficients in high-dimensional linear models as the number of dimensions grows with the number of observations. We show that the posterior distribution concentrates in neigh ... Full text Cite

Multivariate convex regression with adaptive partitioning

Journal Article Journal of Machine Learning Research · November 1, 2013 We propose a new, nonparametric method for multivariate regression subject to convexity or concavity constraints on the response function. Convexity constraints are common in economics, statistics, operations research, financial engineering and optimizatio ... Cite

Multivariate Convex Regression with Adaptive Partitioning

Journal Article JOURNAL OF MACHINE LEARNING RESEARCH · November 1, 2013 Link to item Cite

Lipid adjustment for chemical exposures: accounting for concomitant variables.

Journal Article Epidemiology (Cambridge, Mass.) · November 2013 BackgroundSome environmental chemical exposures are lipophilic and need to be adjusted by serum lipid levels before data analyses. There are currently various strategies that attempt to account for this problem, but all have their drawbacks. To ad ... Full text Cite

Bayesian consensus clustering.

Journal Article Bioinformatics (Oxford, England) · October 2013 MotivationIn biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most ... Full text Cite

Analysis of space-time relational data with application to legislative voting

Journal Article Computational Statistics and Data Analysis · July 29, 2013 We consider modeling spatio-temporally indexed relational data, motivated by analysis of voting data for the United States House of Representatives over two decades. The data are characterized by incomplete binary matrices, representing votes of legislator ... Full text Cite

Generalized admixture mapping for complex traits.

Journal Article G3 (Bethesda) · July 8, 2013 Admixture mapping is a popular tool to identify regions of the genome associated with traits in a recently admixed population. Existing methods have been developed primarily for identification of a single locus influencing a dichotomous trait within a case ... Full text Open Access Link to item Cite

Bayesian Gaussian Copula Factor Models for Mixed Data.

Journal Article Journal of the American Statistical Association · June 2013 Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through gene ... Full text Open Access Cite

Classification via bayesian nonparametric learning of affine subspaces

Journal Article Journal of the American Statistical Association · May 31, 2013 It has become common for datasets to contain large numbers of variables in studies conducted in areas such as genetics, machine vision, image analysis, and many others. When analyzing such data, parametric models are often too inflexible while nonparametri ... Full text Cite

Posterior consistency in conditional distribution estimation.

Journal Article Journal of multivariate analysis · April 2013 A wide variety of priors have been proposed for nonparametric Bayesian estimation of conditional distributions, and there is a clear need for theorems providing conditions on the prior for large support, as well as posterior consistency. Estimation of an u ... Full text Cite

Spatio-temporal modeling of legislation and votes

Journal Article Bayesian Analysis · March 22, 2013 A model is presented for analysis of multivariate binary data with spatio-temporal dependencies, and applied to congressional roll call data from the United States House of Representatives and Senate. The model considers each legislator's constituency (loc ... Full text Cite

Genetic heterogeneity of diffuse large B-cell lymphoma.

Journal Article Proc Natl Acad Sci U S A · January 22, 2013 Diffuse large B-cell lymphoma (DLBCL) is the most common form of lymphoma in adults. The disease exhibits a striking heterogeneity in gene expression profiles and clinical outcomes, but its genetic causes remain to be fully defined. Through whole genome an ... Full text Link to item Cite

Multiscale dictionary learning for estimating conditional distributions

Journal Article Advances in Neural Information Processing Systems · January 1, 2013 Nonparametric estimation of the conditional distribution of a response given highdimensional features is a challenging problem. It is important to allow not only the mean but also the variance and shape of the response density to change flexibly with featu ... Open Access Cite

Bayesian modeling of temporal properties of infectious disease in a college student population

Journal Article Journal of Applied Statistics · 2013 Cite

Bayesian modeling of temporal dependence in large sparse contingency tables.

Journal Article Journal of the American Statistical Association · January 2013 In many applications, it is of interest to study trends over time in relationships among categorical variables, such as age group, ethnicity, religious affiliation, political party and preference for particular policies. At each time point, a sample of ind ... Full text Cite

Locally Adaptive Bayes Nonparametric Regression via Nested Gaussian Processes.

Journal Article Journal of the American Statistical Association · January 2013 We propose a nested Gaussian process (nGP) as a locally adaptive prior for Bayesian nonparametric regression. Specified through a set of stochastic differential equations (SDEs), the nGP imposes a Gaussian process prior for the function's mth-order ... Full text Cite

Locally adaptive bayesian multivariate time series

Journal Article Advances in Neural Information Processing Systems · January 1, 2013 In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such locally ada ... Cite

Diagonal orthant multinomial probit models

Conference Journal of Machine Learning Research · January 1, 2013 Bayesian classification commonly relies on probit models, with data augmentation algorithms used for posterior computation. By imputing latent Gaussian variables, one can often trivially adapt computational approaches used in Gaussian models. However, MCMC ... Cite

Bayesian learning of joint distributions of objects

Conference Journal of Machine Learning Research · January 1, 2013 There is increasing interest in broad application areas in defining flexible joint models for data having a variety of measurement scales, while also allowing data of complex types, such as functions, images and documents. We consider a general framework f ... Cite

Bayesian data analysis, third edition

Book · January 1, 2013 Broadening its scope to nonstatisticians, Bayesian Methods for Data Analysis, Third Edition provides an accessible introduction to the foundations and applications of Bayesian analysis. Along with a complete reorganization of the material, this edition con ... Cite

Multichannel electrophysiological spike sorting via joint dictionary learning and mixture modeling

Journal Article IEEE Transactions on Biomedical Engineering · 2013 Open Access Cite

Deep Learning with Hierarchical Convolutional Factor Analysis.

Journal Article IEEE transactions on pattern analysis and machine intelligence · January 2013 Unsupervised multi-layered ("deep") models are considered for general data, with a particular focus on imagery. The model is represented using a hierarchical convolutional factor-analysis construction, with sparse factor loadings and scores. The computatio ... Cite

Efficient Gaussian process regression for large datasets

Journal Article Biometrika · 2013 Gaussian processes are widely used in nonparametric regression, classification and spatiotemporal modelling, facilitated in part by a rich literature on their theoretical properties. However, one of their practical limitations is expensive computation, typ ... Full text Open Access Cite

Bayesian variable selection in quantile regression

Journal Article Statistics and Its Interface · January 1, 2013 In many applications, interest focuses on assessing relationships between predictors and the quantiles of the distribution of a continuous response. For example, in epidemiology studies, cutoffs to define premature delivery have been based on the 10th perc ... Full text Cite

GENERALIZED DOUBLE PARETO SHRINKAGE.

Journal Article Statistica Sinica · January 2013 We propose a generalized double Pareto prior for Bayesian shrinkage estimation and inferences in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, forming a bridge between the Laplace and Normal-Jeffreys' prio ... Cite

Adverse subpopulation regression for multivariate outcomes with high-dimensional predictors.

Journal Article Stat Med · December 20, 2012 Biomedical studies have a common interest in assessing relationships between multiple related health outcomes and high-dimensional predictors. For example, in reproductive epidemiology, one may collect pregnancy outcomes such as length of gestation and bir ... Full text Link to item Cite

Bayesian latent factor regression for functional and longitudinal data.

Journal Article Biometrics · December 2012 In studies involving functional data, it is commonly of interest to model the impact of predictors on the distribution of the curves, allowing flexible effects on not only the mean curve but also the distribution about the mean. Characterizing the curve fo ... Full text Cite

The genetic landscape of mutations in Burkitt lymphoma.

Journal Article Nat Genet · December 2012 Burkitt lymphoma is characterized by deregulation of MYC, but the contribution of other genetic mutations to the disease is largely unknown. Here, we describe the first completely sequenced genome from a Burkitt lymphoma tumor and germline DNA from the sam ... Full text Link to item Cite

Nonparametric Bayesian Segmentation of a Multivariate Inhomogeneous Space-Time Poisson Process.

Journal Article Bayesian analysis · December 2012 A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson ... Full text Cite

Multiresolution Gaussian processes

Journal Article Advances in Neural Information Processing Systems · December 1, 2012 We propose a multiresolution Gaussian process to capture long-range, non-Markovian dependencies while allowing for abrupt changes and non-stationarity. The multiresolution GP hierarchically couples a collection of smooth GPs, each defined over an element o ... Cite

Repulsive mixtures

Journal Article Advances in Neural Information Processing Systems · December 1, 2012 Discrete mixtures are used routinely in broad sweeping applications ranging from unsupervised settings to fully supervised multi-task learning. Indeed, finite mixtures and infinite mixtures, relying on Dirichlet processes and modifications, have become a s ... Cite

Lognormal and gamma mixed negative binomial regression

Journal Article Proceedings of the 29th International Conference on Machine Learning Icml 2012 · October 10, 2012 In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model f ... Open Access Cite

Bayesian watermark attacks

Journal Article Proceedings of the 29th International Conference on Machine Learning Icml 2012 · October 10, 2012 This paper presents an application of statistical machine learning to the field of water-marking. We propose a new attack model on additive spread-spectrum watermarking systems. The proposed attack is based on Bayesian statistics. We consider the scenario ... Cite

Ensemble methods for convex regression with applications to geometric programming based circuit design

Journal Article Proceedings of the 29th International Conference on Machine Learning Icml 2012 · October 10, 2012 Convex regression is a promising area for bridging statistical estimation and deterministic convex optimization. New piecewise linear convex regression methods (Hannah and Dunson, 2011; Magnani and Boyd, 2009) are fast and scalable, but can have instabilit ... Cite

Semiparametric Bayesian local functional models for diffusion tensor tract statistics.

Journal Article NeuroImage · October 2012 We propose a semiparametric Bayesian local functional model (BFM) for the analysis of multiple diffusion properties (e.g., fractional anisotropy) along white matter fiber bundles with a set of covariates of interest, such as age and gender. BFM accounts fo ... Full text Cite

Nonparametric Bayes classification and hypothesis testing on manifolds

Journal Article Journal of Multivariate Analysis · October 1, 2012 Our first focus is prediction of a categorical response variable using features that lie on a general manifold. For example, the manifold may correspond to the surface of a hypersphere. We propose a general kernel mixture model for the joint distribution o ... Full text Cite

Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds.

Journal Article Annals of the Institute of Statistical Mathematics · August 2012 This article considers a broad class of kernel mixture density models on compact metric spaces and manifolds. Following a Bayesian approach with a nonparametric prior on the location mixing distribution, sufficient conditions are obtained on the kernel, pr ... Full text Cite

Simplex Factor Models for Multivariate Unordered Categorical Data.

Journal Article Journal of the American Statistical Association · March 2012 Gaussian latent factor models are routinely used for modeling of dependence in continuous, binary, and ordered categorical data. For unordered categorical variables, Gaussian latent factor models lead to challenging computation and complex modeling structu ... Full text Cite

Nonparametric Bayes Regression and Classification Through Mixtures of Product Kernels

Chapter · January 19, 2012 It is routine in many fields to collect data having a variety of measurement scales and supports. For example, in biomedical studies for each patient one may collect functional data on a biomarker over time, gene expression values normalized to lie on a hy ... Full text Cite

Hierarchical latent dictionaries for models of brain activation

Conference Journal of Machine Learning Research · January 1, 2012 In this work, we propose a hierarchical latent dictionary approach to estimate the timevarying mean and covariance of a process for which we have only limited noisy samples. We fully leverage the limited sample size and redundancy in sensor measurements by ... Cite

Nonparametric Bayes Modeling of Multivariate Categorical Data.

Journal Article Journal of the American Statistical Association · January 2012 Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used approaches rely on the incorpora ... Full text Cite

Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images.

Journal Article IEEE transactions on image processing : a publication of the IEEE Signal Processing Society · January 2012 Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive, incomplete, and/or noisy measurements. A truncated beta-Bernoulli process is employed to infer an appropriate dictionary for the data under test and also for imag ... Full text Cite

Beta-negative binomial process and poisson factor analysis

Journal Article Journal of Machine Learning Research · January 1, 2012 A beta-negative binomial (BNB) process is proposed, leading to a beta-gamma-Poisson process, which may be viewed as a "multiscoop" generalization of the beta-Bernoulli process. The BNB process is augmented into a beta-gamma-gamma-Poisson hierarchical struc ... Open Access Cite

High-Dimensional Longitudinal Genomic Data: An analysis used for monitoring viral infections.

Journal Article IEEE Signal Process Mag · January 1, 2012 Full text Link to item Cite

The kernel beta process

Journal Article Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 · December 1, 2011 A new Lévy process prior is proposed for an uncountable collection of covariate-dependent feature-learning measures; the model is called the kernel beta process (KBP). Available covariates are handled efficiently via the kernel construction, with covariate ... Cite

Hierarchical topic modeling for analysis of time-evolving personal choices

Journal Article Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 · December 1, 2011 The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that m ... Cite

Bayesian Kernel Mixtures for Counts.

Journal Article Journal of the American Statistical Association · December 2011 Although Bayesian nonparametric mixture models for continuous data are well developed, there is a limited literature on related approaches for count data. A common strategy is to use a mixture of Poissons, which unfortunately is quite restrictive in not ac ... Full text Cite

The hierarchical beta process for convolutional factor analysis and deep learning

Journal Article Proceedings of the 28th International Conference on Machine Learning Icml 2011 · October 7, 2011 A convolutional factor-analysis model is developed, with the number of filters (factors) inferred via the beta process (BP) and hierarchical BP, for single-task and multi-task learning, respectively. The computation of the model parameters is implemented w ... Cite

Approximate dynamic programming for storage problems

Journal Article Proceedings of the 28th International Conference on Machine Learning Icml 2011 · October 7, 2011 Storage problems are an important subclass of stochastic control problems. This paper presents a new method, approximate dynamic programming for storage, to solve storage problems with continuous, convex decision sets. Unlike other solution procedures, ADP ... Cite

Bayesian isotonic density regression.

Journal Article Biometrika · September 2011 Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distribution ... Full text Cite

Semiparametric bayes' proportional odds models for current status data with underreporting.

Journal Article Biometrics · September 2011 Current status data are a type of interval-censored event time data in which all the individuals are either left or right censored. For example, our motivation is drawn from a cross-sectional study, which measured whether or not fibroid onset had occurred ... Full text Cite

Nonparametric Bayes Stochastically Ordered Latent Class Models.

Journal Article J Am Stat Assoc · September 1, 2011 Latent class models (LCMs) are used increasingly for addressing a broad variety of problems, including sparse modeling of multivariate and longitudinal data, model-based clustering, and flexible inferences on predictor effects. Typical frequentist LCMs req ... Full text Link to item Cite

Covariate-dependent dictionary learning and sparse coding

Journal Article ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings · August 18, 2011 A dependent hierarchical beta process (dHBP) is developed as a prior for data that may be represented in terms of a sparse set of latent features (dictionary elements), with covariate-dependent feature usage. The dHBP is applicable to general covariates an ... Full text Cite

Sparse variational analysis of linear mixed models for large data sets

Journal Article Statistics and Probability Letters · August 1, 2011 It is increasingly common to be faced with longitudinal or multi-level data sets that have large numbers of predictors and/or a large sample size. Current methods of fitting and inference for mixed effects models tend to perform poorly in such settings. Wh ... Full text Cite

Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

Journal Article PLoS computational biology · July 2011 Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false ... Full text Open Access Cite

High-dimensional variable selection in meta-analysis for censored data.

Journal Article Biometrics · June 2011 This article considers the problem of selecting predictors of time to an event from a high-dimensional set of candidate predictors using data from multiple studies. As an alternative to the current multistage testing approaches, we propose to model the stu ... Full text Cite

Sparse Bayesian infinite factor models.

Journal Article Biometrika · June 2011 We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadin ... Full text Cite

Posterior simulation across nonparametric models for functional clustering

Journal Article Sankhya B · May 1, 2011 By choosing a species sampling random probability measure for the distribution of the basis coefficients, a general class of nonparametric Bayesian methods for clustering of functional data is developed. Allowing the basis functions to be unknown, one face ... Full text Cite

Bayesian Local Contamination Models for Multivariate Outliers.

Journal Article Technometrics : a journal of statistics for the physical, chemical, and engineering sciences · May 2011 In studies where data are generated from multiple locations or sources it is common for there to exist observations that are quite unlike the majority. Motivated by the application of establishing a reference value in an inter-laboratory setting when outly ... Full text Cite

Impaired limbic gamma oscillatory synchrony during anxiety-related behavior in a genetic mouse model of bipolar mania.

Journal Article J Neurosci · April 27, 2011 Alterations in anxiety-related processing are observed across many neuropsychiatric disorders, including bipolar disorder. Though polymorphisms in a number of circadian genes confer risk for this disorder, little is known about how changes in circadian gen ... Full text Link to item Cite

Bayesian Spatial Quantile Regression.

Journal Article Journal of the American Statistical Association · March 2011 Tropospheric ozone is one of the six criteria pollutants regulated by the United States Environmental Protection Agency under the Clean Air Act and has been linked with several adverse health effects, including mortality. Due to the strong dependence on we ... Full text Cite

Learning Low-Dimensional Signal Models: A Bayesian approach based on incomplete measurements.

Journal Article IEEE signal processing magazine · March 2011 Full text Cite

Erratum: Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: Algorithm and performance bounds (IEEE Transactions Signal Processing (2011)) 58,12 (6140-6155))

Journal Article IEEE Transactions on Signal Processing · March 1, 2011 Full text Cite

Bayesian geostatistical modelling with informative sampling locations.

Journal Article Biometrika · March 2011 We consider geostatistical models that allow the locations at which data are collected to be informative about the outcomes. A Bayesian approach is proposed, which models the locations using a log Gaussian Cox process, while modelling the outcomes conditio ... Full text Cite

Nonparametric Bayesian models through probit stick-breaking processes.

Journal Article Bayesian analysis · March 2011 We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowi ... Full text Cite

The local Dirichlet process.

Journal Article Annals of the Institute of Statistical Mathematics · February 2011 As a generalization of the Dirichlet process (DP) to allow predictor dependence, we propose a local Dirichlet process (lDP). The lDP provides a prior distribution for a collection of random probability measures indexed by predictors. This is accomplished b ... Full text Cite

Bayesian Variable Selection via Particle Stochastic Search.

Journal Article Statistics & probability letters · February 2011 We focus on Bayesian variable selection in regression models. One challenge is to search the huge model space adequately, while identifying high posterior probability regions. In the past decades, the main focus has been on the use of Markov chain Monte Ca ... Full text Cite

The kernel beta process

Conference Advances in Neural Information Processing Systems 24 25th Annual Conference on Neural Information Processing Systems 2011 Nips 2011 · January 1, 2011 A new Lévy process prior is proposed for an uncountable collection of covariate-dependent feature-learning measures; the model is called the kernel beta process (KBP). Available covariates are handled efficiently via the kernel construction, with covariate ... Cite

Hierarchical topic modeling for analysis of time-evolving personal choices

Conference Advances in Neural Information Processing Systems 24 25th Annual Conference on Neural Information Processing Systems 2011 Nips 2011 · January 1, 2011 The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that m ... Cite

Generalized beta mixtures of Gaussians

Conference Advances in Neural Information Processing Systems 24 25th Annual Conference on Neural Information Processing Systems 2011 Nips 2011 · January 1, 2011 In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better pro ... Cite

Predicting Viral Infection From High-Dimensional Biomarker Trajectories.

Journal Article J Am Stat Assoc · January 1, 2011 There is often interest in predicting an individual's latent health status based on high-dimensional biomarkers that vary over time. Motivated by time-course gene expression array data that we have collected in two influenza challenge studies performed wit ... Full text Link to item Cite

Dependent hierarchical beta process for image interpolation and denoising

Journal Article Journal of Machine Learning Research · January 1, 2011 A dependent hierarchical beta process (dHBP) is developed as a prior for data that may be represented in terms of a sparse set of latent features, with covariate-dependent feature usage. The dHBP is applicable to general covariates and data models, imposin ... Cite

Tree-Structured Infinite Sparse Factor Model.

Journal Article Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning · January 2011 A tree-structured multiplicative gamma process (TMGP) is developed, for inferring the depth of a tree-based factor-analysis model. This new model is coupled with the nested Chinese restaurant process, to nonparametrically infer the depth and width (structu ... Cite

Topic Modeling with Nonparametric Markov Tree.

Journal Article Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning · January 2011 A new hierarchical tree-based topic model is developed, based on nonparametric Bayesian techniques. The model has two unique attributes: (i) a child node in the tree may have more than one parent, with the goal of eliminating redundant sub-topics de ... Cite

Logistic Stick-Breaking Process.

Journal Article Journal of machine learning research : JMLR · January 2011 A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via ... Cite

Generalized Beta Mixtures of Gaussians

Journal Article Advances in Neural Information Processing Systems · 2011 In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better pro ... Open Access Cite

Preface to the proceedings of AISTATS 2011

Journal Article Journal of Machine Learning Research · January 1, 2011 Cite

Nonparametric bayesian matrix completion

Journal Article 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop SAM 2010 · December 20, 2010 The Beta-Binomial processes are considered for inferring missing values in matrices. The model moves beyond the low-rank assumption, modeling the matrix columns as residing in a nonlinear subspace. Large-scale problems are considered via efficient Gibbs sa ... Full text Cite

Nonparametric Bayesian density estimation on manifolds with applications to planar shapes.

Journal Article Biometrika · December 2010 Statistical analysis on landmark-based shape spaces has diverse applications in morphometrics, medical diagnostics, machine vision and other areas. These shape spaces are non-Euclidean quotient manifolds. To conduct nonparametric inferences, one may define ... Full text Cite

Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds.

Journal Article IEEE transactions on signal processing : a publication of the IEEE Signal Processing Society · December 2010 Nonparametric Bayesian methods are employed to constitute a mixture of low-rank Gaussians, for data x ∈ ℝ N that are of high dimension N but are constrained to reside in a low-dimensional subregion of ℝ N< ... Full text Cite

Joint analysis of time-evolving binary matrices and associated documents

Journal Article Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010 · December 1, 2010 We consider problems for which one has incomplete binary matrices that evolve with time (e:g:, the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure ... Cite

Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies.

Journal Article BMC Bioinformatics · November 9, 2010 BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis ... Full text Open Access Link to item Cite

Probabilistic Topic Models: A focus on graphical model design and applications to document and image analysis.

Journal Article IEEE signal processing magazine · November 2010 Full text Cite

MULTIVARIATE KERNEL PARTITION PROCESS MIXTURES.

Journal Article Statistica Sinica · October 2010 Mixtures provide a useful approach for relaxing parametric assumptions. Discrete mixture models induce clusters, typically with the same cluster allocation for each parameter in multivariate cases. As a more flexible approach that facilitates sparse nonpar ... Cite

Dynamic model for multivariate markers of fecundability.

Journal Article Biometrics · September 2010 Dynamic latent class models provide a flexible framework for studying biologic processes that evolve over time. Motivated by studies of markers of the fertile days of the menstrual cycle, we propose a discrete-time dynamic latent class framework, allowing ... Full text Cite

Semiparametric Bayes hierarchical models with mean and variance constraints.

Journal Article Computational statistics & data analysis · September 2010 In parametric hierarchical models, it is standard practice to place mean and variance constraints on the latent variable distributions for the sake of identifiability and interpretability. Because incorporation of such constraints is challenging in semipar ... Full text Cite

Are Chinese people really more fertile?

Journal Article Fertility and sterility · August 2010 Full text Cite

Bayesian generalized product partition model

Journal Article Statistica Sinica · July 1, 2010 Starting with a carefully formulated Dirichlet process (DP) mixture model, we derive a generalized product partition model (GPPM) in which the partition process is predictor-dependent. The GPPM generalizes DP clustering to relax the exchangeability assumpt ... Open Access Cite

Stochastically ordered multiple regression.

Journal Article Biostatistics (Oxford, England) · July 2010 In various application areas, prior information is available about the direction of the effects of multiple predictors on the conditional response distribution. For example, in epidemiology studies of potentially adverse exposures and continuous health res ... Full text Cite

Bayesian semiparametric multiple shrinkage.

Journal Article Biometrics · June 2010 High-dimensional and highly correlated data leading to non- or weakly identified effects are commonplace. Maximum likelihood will typically fail in such situations and a variety of shrinkage methods have been proposed. Standard techniques, such as ridge re ... Full text Cite

Dynamic nonparametric bayesian models for analysis of music

Journal Article Journal of the American Statistical Association · June 1, 2010 The dynamic hierarchical Dirichlet process (dHDP) is developed to model complex sequential data, with a focus on audio signals from music. The music is represented in terms of a sequence of discrete observations, and the sequence is modeled using a hidden ... Full text Open Access Cite

Latent Stick-Breaking Processes.

Journal Article Journal of the American Statistical Association · April 2010 We develop a model for stochastic processes with random marginal distributions. Our model relies on a stick-breaking construction for the marginal distribution of the process, and introduces dependence across locations by using a latent Gaussian copula mod ... Full text Open Access Cite

Classification with Incomplete Data Using Dirichlet Process Priors.

Journal Article Journal of machine learning research : JMLR · March 2010 A non-parametric hierarchical Bayesian framework is developed for designing a classifier, based on a mixture of simple (linear) classifiers. Each simple classifier is termed a local "expert", and the number of experts and their construction are manifested ... Cite

Joint analysis of time-evolving binary matrices and associated documents

Conference Advances in Neural Information Processing Systems 23 24th Annual Conference on Neural Information Processing Systems 2010 Nips 2010 · January 1, 2010 We consider problems for which one has incomplete binary matrices that evolve with time (e:g:, the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure ... Cite

Preface to the Proceedings of AISTATS 2011

Journal Article Journal of Machine Learning Research · January 1, 2010 Cite

Two-level stochastic search variable selection in GLMs with missing predictors.

Journal Article The international journal of biostatistics · January 2010 Stochastic search variable selection (SSVS) algorithms provide an appealing and widely used approach for searching for good subsets of predictors while simultaneously estimating posterior model probabilities and model-averaged predictive distributions. Thi ... Full text Cite

Nonparametric Bayes Conditional Distribution Modeling With Variable Selection.

Journal Article Journal of the American Statistical Association · December 2009 This article considers a methodology for flexibly characterizing the relationship between a response and multiple predictors. Goals are (1) to estimate the conditional response distribution addressing the distributional changes across the predictor space, ... Full text Open Access Cite

Comment on article by Craigmile et al.

Journal Article Bayesian Analysis · December 1, 2009 Full text Cite

Multi-task classification with infinite local experts

Journal Article ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings · September 23, 2009 We propose a multi-task learning (MTL) framework for nonlinear classification, based on an infinite set of local experts in feature space. The usage of local experts enables sharing at the expert-level, encouraging the borrowing of information even if task ... Full text Cite

Music analysis with a Bayesian dynamic model

Journal Article ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings · September 23, 2009 A Bayesian dynamic model is developed to model complex sequential data, with a focus on audio signals from music. The music is represented in terms of a sequence of discrete observations, and the sequence is modeled using a hidden Markov model (HMM) with t ... Full text Cite

Bayesian hierarchical functional data analysis via contaminated informative priors.

Journal Article Biometrics · September 2009 A variety of flexible approaches have been proposed for functional data analysis, allowing both the mean curve and the distribution about the mean to be unknown. Such methods are most useful when there is limited prior information. Motivated by application ... Full text Cite

Uterine leiomyomata in relation to insulin-like growth factor-I, insulin, and diabetes.

Journal Article Epidemiology (Cambridge, Mass.) · July 2009 BackgroundInsulin-like growth factor-I (IGF-I) and insulin stimulate cell proliferation in uterine leiomyoma (fibroid) tissue. We hypothesized that circulating levels of these proteins would be associated with increased prevalence and size of uter ... Full text Cite

Default Prior Distributions and Efficient Posterior Computation in Bayesian Factor Analysis.

Journal Article Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America · June 2009 Factor analytic models are widely used in social sciences. These models have also proven useful for sparse modeling of the covariance structure in multidimensional data. Normal prior distributions for factor loadings and inverse gamma prior distributions f ... Full text Cite

Nonparametric Bayes kernel-based priors for functional data analysis

Journal Article Statistica Sinica · April 1, 2009 We focus on developing nonparametric Bayes methods for collections of dependent random functions, allowing individual curves to vary flexibly while adaptively borrowing information. A prior is proposed, which is expressed as a hierarchical mixture of weigh ... Cite

Bayesian nonparametric hierarchical modeling.

Journal Article Biometrical journal. Biometrische Zeitschrift · April 2009 In biomedical research, hierarchical models are very widely used to accommodate dependence in multivariate and longitudinal data and for borrowing of information across data from different sources. A primary concern in hierarchical modeling is sensitivity ... Full text Cite

Multitask compressive sensing

Journal Article IEEE Transactions on Signal Processing · January 29, 2009 Compressive sensing (CS) is a framework whereby one performs N nonadaptive measurements to constitute a vector v∈ℝN with v used to recover an approximation u∈RℝM to a desired signal u∈RℝM with N≪ M; this is performed under ... Full text Cite

Nonparametric Bayes local partition models for random effects.

Journal Article Biometrika · January 2009 This paper focuses on the problem of choosing a prior for an unknown random effects distribution within a Bayesian hierarchical model. The goal is to obtain a sparse representation by allowing a combination of global and local borrowing of information. A l ... Full text Cite

Bayesian semiparametric joint models for functional predictors.

Journal Article Journal of the American Statistical Association · January 2009 Motivated by the need to understand and predict early pregnancy loss using hormonal indicators of pregnancy health, this paper proposes a semiparametric Bayes approach for assessing the relationship between functional predictors and a response. A multivari ... Full text Cite

Bayesian hierarchically weighted finite mixture models for samples of distributions.

Journal Article Biostatistics (Oxford, England) · January 2009 Finite mixtures of Gaussian distributions are known to provide an accurate approximation to any unknown density. Motivated by DNA repair studies in which data are collected for samples of cells from different individuals, we propose a class of hierarchical ... Full text Cite

Fast Bayesian inference in Dirichlet process mixture models

Journal Article Journal of Computational & Graphical Statistics · 2009 Cite

Semiparametric Bayes multiple testing: Applications to tumor data.

Journal Article Biometrics · 2009 Cite

Sparse variational analysis of large longitudinal data sets

Journal Article Statistics & Probability Letters · 2009 Cite

A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation.

Journal Article Advances in neural information processing systems · January 2009 A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented i ... Cite

Bayesian Nonparametric Functional Data Analysis Through Density Estimation.

Journal Article Biometrika · January 2009 In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. We propose a hierarchical model that allows us to simultaneously estimate multiple curves nonpar ... Full text Cite

Bayesian nonparametric inference on stochastic ordering.

Journal Article Biometrika · December 2008 This article considers Bayesian inference about collections of unknown distributions subject to a partial stochastic ordering. To address problems in testing of equalities between groups and estimation of group-specific distributions, we propose classes of ... Full text Cite

The nested Dirichlet process: Rejoinder

Journal Article Journal of the American Statistical Association · September 1, 2008 Full text Cite

Multi-task learning for analyzing and sorting large databases of sequential data

Journal Article IEEE Transactions on Signal Processing · August 1, 2008 A new hierarchical nonparametric Bayesian framework is proposed for the problem of multi-task learning (MTL) with sequential data. The models for multiple tasks, each characterized by sequential data, are learned jointly, and the intertask relationships ar ... Full text Cite

Prospective study of breast-feeding in relation to wheeze, atopy, and bronchial hyperresponsiveness in the Avon Longitudinal Study of Parents and Children (ALSPAC).

Journal Article The Journal of allergy and clinical immunology · July 2008 BackgroundBreast-feeding clearly protects against early wheezing, but recent data suggest that it might increase later risk of atopic disease and asthma.ObjectiveWe sought to examine the relationship between breast-feeding and later asthm ... Full text Cite

Kernel stick-breaking processes

Journal Article Biometrika · June 1, 2008 We propose a class of kernel stick-breaking processes for uncountable collections of dependent random probability measures. The process is constructed by first introducing an infinite sequence of random locations. Independent random probability measures an ... Full text Cite

Nonparametric bayes testing of changes in a response distribution with an ordinal predictor.

Journal Article Biometrics · June 2008 In certain biomedical studies, one may anticipate changes in the shape of a response distribution across the levels of an ordinal predictor. For instance, in toxicology studies, skewness and modality might change as dose increases. To address this issue, w ... Full text Cite

Bayesian selection and clustering of polymorphisms in functionally related genes

Journal Article Journal of the American Statistical Association · June 1, 2008 In epidemiologic studies, there is often interest in assessing the relationship between polymorphisms in functionally related genes and a health outcome. For each candidate gene, single nucleotide polymorphism (SNP) data are collected at a number of locati ... Full text Cite

The matrix stick-breaking process: Flexible Bayes meta-analysis

Journal Article Journal of the American Statistical Association · March 1, 2008 In analyzing data from multiple related studies, it often is of interest to borrow information across studies and to cluster similar studies. Although parametric hierarchical models are commonly used, of concern is sensitivity to the form chosen for the ra ... Full text Cite

Comment

Journal Article Journal of the American Statistical Association · March 1, 2008 Full text Cite

Bayesian semiparametric structural equation models with latent variables

Journal Article Psychometrika · 2008 Cite

The nested Dirichlet process (with discussion)

Journal Article Journal of the American Statistical Association · 2008 Cite

Bayesian Inference on Changes in Response Densities over Predictor Clusters.

Journal Article Journal of the American Statistical Association · January 2008 In epidemiology, it is often of interest to assess how individuals with different trajectories over time in an environmental exposure or biomarker differ with respect to a continuous response. For ease in interpretation and presentation of results, epidemi ... Full text Cite

Nonparametric functional data analysis through Bayesian density estimation

Journal Article Biometrika · 2008 Cite

The dynamic hierarchical Dirichlet process

Journal Article Proceedings of the 25th International Conference on Machine Learning · January 1, 2008 The dynamic hierarchical Dirichlet process (dHDP) is developed to model the time-evolving statistical properties of sequential data sets. The data collected at any time point are represented via a mixture associated with an appropriate underlying model, in ... Full text Cite

Hierarchical kernel stick-breaking process for multi-task image analysis

Journal Article Proceedings of the 25th International Conference on Machine Learning · January 1, 2008 The kernel stick-breaking process (KSBP) is employed to segment general imagery, imposing the condition that patches (small blocks of pixels) that are spatially proximate are more likely to be associated with the same cluster (segment). The number of clust ... Full text Cite

Multi-task compressive sensing with dirichlet process priors

Journal Article Proceedings of the 25th International Conference on Machine Learning · January 1, 2008 Compressive sensing (CS) is an emerging £eld that, under appropriate conditions, can signi£cantly reduce the number of measurements required for a given signal. In many applications, one is interested in multiple signals that may be measured in multiple CS ... Full text Cite

The nested dirichlet process

Journal Article Journal of the American Statistical Association · January 1, 2008 In multicenter studies, subjects in different centers may have different outcome distributions. This article is motivated by the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to b ... Full text Cite

Bayesian multivariate isotonic regression splines: Applications to carcinogenicity studies

Journal Article Journal of the American Statistical Association · December 1, 2007 In many applications, interest focuses on assessing the relationship between a predictor and a multivariate outcome variable, and there may be prior knowledge about the shape of the regression curves. For example, regression functions that relate dose of a ... Full text Cite

Bayesian Structural Equation Modeling

Journal Article · December 1, 2007 This chapter focuses on Bayesian structural equation modeling. Structural equation models (SEMs) with latent variables are routinely used in social science research, and are of increasing importance in biomedical applications. Standard practice in implemen ... Full text Cite

Fitting semiparametric random effects models to large data sets.

Journal Article Biostatistics (Oxford, England) · October 2007 For large data sets, it can be difficult or impossible to fit models with random effects using standard algorithms due to memory limitations or high computational burdens. In addition, it would be advantageous to use the abundant information to relax assum ... Full text Cite

Bayesian selection of optimal rules for timing intercourse to conceive by using calendar and mucus.

Journal Article Fertility and sterility · October 2007 ObjectiveTo find optimal clinical rules that maximize the probability of conception while limiting the number of intercourse days required.DesignMulticenter prospective study. Women were followed prospectively while they kept daily record ... Full text Cite

Bayesian methods for latent trait modelling of longitudinal data.

Journal Article Statistical methods in medical research · October 2007 Latent trait models have long been used in the social science literature for studying variables that can only be measured indirectly through multiple items. However, such models are also very useful in accounting for correlation in multivariate and longitu ... Full text Cite

Bayesian adaptive regression splines for hierarchical data.

Journal Article Biometrics · September 2007 This article considers methodology for hierarchical functional data analysis, motivated by studies of reproductive hormone profiles in the menstrual cycle. Current methods standardize the cycle lengths and ignore the timing of ovulation within the cycle, b ... Full text Cite

Multi-task learning for sequential data via iHMMs and the nested Dirichlet process

Journal Article ACM International Conference Proceeding Series · August 23, 2007 A new hierarchical nonparametric Bayesian model is proposed for the problem of multitask learning (MTL) with sequential data. Sequential data are typically modeled with a hidden Markov model (HMM), for which one often must choose an appropriate model struc ... Full text Cite

The matrix stick-breaking process for flexible multi-task learning

Journal Article ACM International Conference Proceeding Series · August 23, 2007 In multi-task learning our goal is to design regression or classification models for each of the tasks and appropriately share information between tasks. A Dirichlet process (DP) prior can be used to encourage task clustering. However, the DP prior does no ... Full text Cite

Effects of sexual intercourse patterns in time to pregnancy studies.

Journal Article American journal of epidemiology · May 2007 Time to pregnancy, typically defined as the number of menstrual cycles required to achieve a clinical pregnancy, is widely used as a measure of couple fecundity in epidemiologic studies. Time to pregnancy studies seldom utilize detailed data on the timing ... Full text Cite

Empirical bayes density regression

Journal Article Statistica Sinica · April 1, 2007 In Bayesian hierarchical modeling, it is often appealing to allow the conditional density of an (observable or unobservable) random variable Y to change flexibly with categorical and continuous predictors X. A mixture of regression models is proposed, with ... Cite

Bayesian density regression

Journal Article Journal of the Royal Statistical Society Series B Statistical Methodology · April 1, 2007 The paper considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a non-parametric mixture of regression models, with t ... Full text Cite

Bayesian methods for searching for optimal rules for timing intercourse to achieve pregnancy.

Journal Article Statistics in medicine · April 2007 With societal trends towards increasing age at starting a pregnancy attempt, many women are concerned about achieving conception before the onset of infertility, which precedes menopause. Couples failing to conceive a pregnancy within 12 months are classif ... Full text Cite

Bayesian methods for highly correlated exposure data.

Journal Article Epidemiology (Cambridge, Mass.) · March 2007 Studies that include individuals with multiple highly correlated exposures are common in epidemiology. Because standard maximum likelihood techniques often fail to converge in such instances, hierarchical regression methods have seen increasing use. Bayesi ... Full text Cite

Fixed and random effects selection in linear and logistic models

Journal Article Biometrics · 2007 Cite

Association of physical activity with development of uterine leiomyoma.

Journal Article American journal of epidemiology · January 2007 The relation between physical activity and uterine leiomyomata (fibroids) has received little study, but exercise is protective for breast cancer, another hormonally mediated tumor. Participants in this study were randomly selected members of a health plan ... Full text Cite

Bayesian semiparametric dynamic frailty models for multiple event time data.

Journal Article Biometrics · December 2006 Many biomedical studies collect data on times of occurrence for a health event that can occur repeatedly, such as infection, hospitalization, recurrence of disease, or tumor onset. To analyze such data, it is necessary to account for within-subject depende ... Full text Cite

Bayesian selection of predictors of conception probabilities across the menstrual cycle.

Journal Article Paediatric and perinatal epidemiology · November 2006 There is increasing interest in identifying predictors of human fertility, including environmental exposures, behavioural factors, and biomarkers, such as mucus or reproductive hormones. Epidemiological studies typically measure fecundability, the per mens ... Full text Cite

Foreword. Expanding Methodologies for Capturing Day-Specific Probabilities of Conception.

Journal Article Paediatric and perinatal epidemiology · November 2006 Full text Cite

Bayesian dynamic modeling of latent trait distributions.

Journal Article Biostatistics (Oxford, England) · October 2006 Studies of latent traits often collect data for multiple items measuring different aspects of the trait. For such data, it is common to consider models in which the different items are manifestations of a normal latent variable, which depends on covariates ... Full text Cite

Performance of tests of association in misspecified generalized linear models

Journal Article Journal of Statistical Planning and Inference · September 1, 2006 We examine the effects of modelling errors, such as underfitting and overfitting, on the asymptotic power of tests of association between an explanatory variable x and an outcome in the setting of generalized linear models. The regression function for x is ... Full text Cite

Bayesian covariance selection in generalized linear mixed models.

Journal Article Biometrics · June 2006 The generalized linear mixed model (GLMM), which extends the generalized linear model (GLM) to incorporate random effects characterizing heterogeneity among subjects, is widely used in analyzing correlated and longitudinal data. Although there is often int ... Full text Cite

Special issue of statistical methods in medical research on reproductive studies

Journal Article Statistical Methods in Medical Research · April 1, 2006 Full text Cite

Cervical mucus secretions on the day of intercourse: an accurate marker of highly fertile days.

Journal Article European journal of obstetrics, gynecology, and reproductive biology · March 2006 ObjectiveTo provide estimates of the probabilities of conception according to vulvar mucus observations classified by the woman on the day of intercourse.Study designProspective cohort study of 193 outwardly healthy Italian women using th ... Full text Cite

Luteinizing hormone in premenopausal women may stimulate uterine leiomyomata development.

Journal Article Journal of the Society for Gynecologic Investigation · February 2006 ObjectiveHuman chorionic gonadotropin (hCG) has proliferative effects on uterine smooth muscle and leiomyoma tissue in vitro. We hypothesized that luteinizing hormone (LH) would have the same effect by activating the LH/hCG receptor, and it would ... Full text Cite

Fertility Studies

Chapter · January 1, 2006 In recent years there has been increasing concern that human exposure to environmental agents may disrupt the endocrine system and alter reproduction. For example, some studies have observed secular declines in semen quality over the past 50 years, and sev ... Full text Cite

Transgenic Mouse Model

Chapter · January 1, 2006 Full text Cite

The authors replied as follows [2]

Journal Article Biometrics · January 1, 2006 Full text Cite

Bayesian inferences on umbrella orderings.

Journal Article Biometrics · December 2005 In regression applications with categorical predictors, interest often focuses on comparing the null hypothesis of homogeneity to an ordered alternative. This article proposes a Bayesian approach for addressing this problem in the setting of normal linear ... Full text Cite

Bayesian Biostatistics

Journal Article Handbook of Statistics · December 1, 2005 With the rapid increase in biomedical technology and the accompanying generation of complex and high-dimensional data sets, Bayesian statistical methods have become much more widely used. One reason is that the Bayesian probability modeling machinery provi ... Full text Cite

Comments about Joint Modeling of Cluster Size and Binary and Continuous Subunit-Specific Outcomes.

Journal Article Biometrics · September 2005 In longitudinal studies and in clustered situations often binary and continuous response variables are observed and need to be modeled together. In a recent publication Dunson, Chen, and Harry (2003, Biometrics 59, 521-530) (DCH) propose a Bayesian approac ... Full text Cite

Estimation of order-restricted means from correlated data

Journal Article Biometrika · September 1, 2005 In many applications, researchers are interested in estimating the mean of a multivariate normal random vector whose components are subject to order restrictions. Various authors have demonstrated that the likelihood-based methodology may perform poorly un ... Full text Cite

Maternal serum levels of polychlorinated biphenyls and 1,1-dichloro-2,2-bis(p-chlorophenyl)ethylene (DDE) and time to pregnancy.

Journal Article American journal of epidemiology · September 2005 Polychlorinated biphenyls (PCBs), once used widely in transformers and other applications, and 1,1-dichloro-2,2-bis(p-chlorophenyl)ethylene (DDE), the main metabolite of the pesticide 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT), are hormonally acti ... Full text Cite

A transformation approach for incorporating monotone or unimodal constraints.

Journal Article Biostatistics (Oxford, England) · July 2005 Samples of curves are collected in many applications, including studies of reproductive hormone levels in the menstrual cycle. Many approaches have been proposed for correlated functional data of this type, including smoothing spline methods and other flex ... Full text Cite

Bayesian semiparametric isotonic regression for count data

Journal Article Journal of the American Statistical Association · June 1, 2005 This article proposes a semiparametric Bayesian approach for inference on an unknown isotonic regression function, f(x), characterizing the relationship between a continuous predictor, X, and a count response variable, Y, adjusting for covariates, Z. A Dir ... Full text Cite

Bayesian model selection and averaging in additive and proportional hazards models.

Journal Article Lifetime data analysis · June 2005 Although Cox proportional hazards regression is the default analysis for time to event data, there is typically uncertainty about whether the effects of a predictor are more appropriately characterized by a multiplicative or additive model. To accommodate ... Full text Cite

Reduced fertilization rates in older men when cervical mucus is suboptimal.

Journal Article Obstetrics and gynecology · April 2005 ObjectiveCervical mucus is vital in the regulation of sperm survival and transport through the reproductive tract. The goal of this study is to assess whether the lowered fertility for men in their late 30s and early 40s is related to the nature o ... Full text Cite

Approximate Bayesian inference for quantites

Journal Article Journal of Nonparametric Statistics · April 1, 2005 Suppose data consist of a random sample from a distribution function F Y, which is unknown, and that interest focuses on inferences on θ, a vector of quantiles of FY. When the likelihood function is not fully specified, a posterior de ... Full text Cite

Bayesian inferences on predictors of conception probabilities.

Journal Article Biometrics · March 2005 Reproductive scientists and couples attempting pregnancy are interested in identifying predictors of the day-specific probabilities of conception in relation to the timing of a single intercourse act. Because most menstrual cycles have multiple days of int ... Full text Cite

Maternal serum level of the DDT metabolite DDE in relation to fetal loss in previous pregnancies.

Journal Article Environmental research · February 2005 Use of 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) continues in about 25 countries. This use has been justified partly by the belief that it has no adverse consequences on human health. Evidence has been increasing, however, for adverse reproductiv ... Full text Cite

Bayesian latent variable models for mixed discrete outcomes.

Journal Article Biostatistics (Oxford, England) · January 2005 In studies of complex health conditions, mixtures of discrete outcomes (event time, count, binary, ordered categorical) are commonly collected. For example, studies of skin tumorigenesis record latency time prior to the first tumor, increases in the number ... Full text Cite

Modeling the effects of a bidirectional latent predictor from multivariate questionnaire data.

Journal Article Biometrics · December 2004 Researchers often measure stress using questionnaire data on the occurrence of potentially stress-inducing life events and the strength of reaction to these events, characterized as negative or positive and assigned an ordinal ranking. In studying the heal ... Full text Cite

Bayesian multivariate logistic regression.

Journal Article Biometrics · September 2004 Bayesian analyses of multivariate binary or categorical outcomes typically rely on probit or mixed effects logistic regression models that do not have a marginal logistic structure for the individual outcomes. In addition, difficulties arise when simple no ... Full text Link to item Cite

Bayesian modeling of multiple lesion onset and growth from interval-censored data.

Journal Article Biometrics · September 2004 In studying rates of occurrence and progression of lesions (or tumors), it is typically not possible to obtain exact onset times for each lesion. Instead, data consist of the number of lesions that reach a detectable size between screening examinations, al ... Full text Cite

Studying human fertility and environmental exposures.

Journal Article Environmental health perspectives · August 2004 Full text Open Access Cite

On the frequency of intercourse around ovulation: evidence for biological influences.

Journal Article Human reproduction (Oxford, England) · July 2004 BackgroundIntercourse in mammals is often coordinated with ovulation, for example through fluctuations in libido or by the acceleration of ovulation with intercourse. Such coordination has not been established in humans. We explored this possibili ... Full text Cite

Selecting factors predictive of heterogeneity in multivariate event time data.

Journal Article Biometrics · June 2004 In multivariate survival analysis, investigators are often interested in testing for heterogeneity among clusters, both overall and within specific classes. We represent different hypotheses about the heterogeneity structure using a sequence of gamma frail ... Full text Cite

Bayesian estimation of survival functions under stochastic precedence.

Journal Article Lifetime data analysis · June 2004 When estimating the distributions of two random variables, X and Y, investigators often have prior information that Y tends to be bigger than X. To formalize this prior belief, one could potentially assume stochastic ordering between X and Y, which implies ... Full text Cite

Bayesian isotonic regression and trend analysis.

Journal Article Biometrics · June 2004 In many applications, the mean of a response variable can be assumed to be a nondecreasing function of a continuous predictor, controlling for covariates. In such cases, interest often focuses on estimating the regression function, while also assessing evi ... Full text Cite

Mucus observations in the fertile window: a better predictor of conception than timing of intercourse.

Journal Article Human reproduction (Oxford, England) · April 2004 BackgroundIntercourse results in a pregnancy essentially only if it occurs during the 6-day fertile interval ending on the day of ovulation. The strong association between timing of intercourse within this interval and the probability of conceptio ... Full text Cite

Increased infertility with age in men and women.

Journal Article Obstetrics and gynecology · January 2004 ObjectiveTo estimate the effects of aging on the percentage of outwardly healthy couples who are sterile (completely unable to conceive without assisted reproduction) or infertile (unable to conceive within a year of unprotected intercourse).M ... Full text Cite

Methodologic and statistical approaches to studying human fertility and environmental exposure.

Journal Article Environmental health perspectives · January 2004 Although there has been growing concern about the effects of environmental exposures on human fertility, standard epidemiologic study designs may not collect sufficient data to identify subtle effects while properly adjusting for confounding. In particular ... Full text Open Access Cite

Effect of antioxidants on the papilloma response and liver glutathione modulation mediated by arsenic in tg.ac transgenic mice

Journal Article Arsenic Exposure and Health Effects V · December 18, 2003 Epidemiological studies indicate that inorganic arsenicals produce various skin lesions as well as skin, lung, bladder, liver, prostate, and renal cancer. Our laboratory previously demonstrated that low-dose 12-O-tetradecanoylphorbol-13-acetate (TPA) incre ... Full text Cite

Random effects selection in linear mixed models.

Journal Article Biometrics · December 2003 We address the important practical problem of how to select the random effects component in a linear mixed model. A hierarchical Bayesian model is used to identify any random effect with zero variance. The proposed approach reparameterizes the mixed model ... Full text Cite

Bayesian inferences in the Cox model for order-restricted hypotheses.

Journal Article Biometrics · December 2003 In studying the relationship between an ordered categorical predictor and an event time, it is standard practice to include dichotomous indicators of the different levels of the predictor in a Cox model. One can then use a multiple degree-of-freedom score ... Full text Cite

Dynamic Latent Trait Models for Multidimensional Longitudinal Data

Journal Article Journal of the American Statistical Association · September 1, 2003 This article presents a new approach for analysis of multidimensional longitudinal data, motivated by studies using an item response battery to measure traits of an individual repeatedly over time. A general modeling framework is proposed that allows mixtu ... Full text Cite

A Bayesian approach for joint modeling of cluster size and subunit-specific outcomes.

Journal Article Biometrics · September 2003 In applications that involve clustered data, such as longitudinal studies and developmental toxicity experiments, the number of subunits within a cluster is often correlated with outcomes measured on the individual subunits. Analyses that ignore this depen ... Full text Cite

Bayesian inference on order-constrained parameters in generalized linear models.

Journal Article Biometrics · June 2003 In biomedical studies, there is often interest in assessing the association between one or more ordered categorical predictors and an outcome variable, adjusting for covariates. For a k-level predictor, one typically uses either a k-1 degree of freedom (df ... Full text Cite

Bayesian latent variable models for median regression on multiple outcomes.

Journal Article Biometrics · June 2003 Often a response of interest cannot be measured directly and it is necessary to rely on multiple surrogates, which can be assumed to be conditionally independent given the latent response and observed covariates. Latent response models typically assume tha ... Full text Cite

Vulvar mucus observations and the probability of pregnancy.

Journal Article Obstetrics and gynecology · June 2003 ObjectiveTo assess the day-specific and cycle-specific probabilities of conception leading to clinical pregnancy, in relation to the timing of intercourse and vulvar mucus observations.MethodsThis was a retrospective cohort study of women ... Full text Cite

Incorporating heterogeneous intercourse records into time to pregnancy models

Journal Article Mathematical Population Studies · April 1, 2003 Information on the timing of intercourse relative to ovulation can be incorporated into time to pregnancy models to improve the power to detect covariate effects, to estimate the day-specific conception probabilities, and to distinguish between biological ... Full text Cite

Why is parity protective for uterine fibroids?

Journal Article Epidemiology (Cambridge, Mass.) · March 2003 Uterine fibroids are benign tumors, the etiology of which is not understood. Symptoms can be debilitating, and the primary treatment is surgery, usually hysterectomy. Epidemiologic data show that pregnancy is associated with reduced risk of fibroids. We hy ... Full text Cite

Bayesian modeling of time-varying and waning exposure effects.

Journal Article Biometrics · March 2003 In epidemiologic studies, there is often interest in assessing the association between exposure history and disease incidence. For many diseases, incidence may depend not only on cumulative exposure, but also on the ages at which exposure occurred. This ar ... Full text Cite

Bayesian modeling of markers of day-specific fertility

Journal Article Journal of the American Statistical Association · March 1, 2003 Cervical mucus hydration increases during the fertile interval before ovulation. Because sperm can only penetrate mucus having a high water content, cervical secretions provide a reliable marker of the fertile days of the menstrual cycle. This article deve ... Full text Cite

Breast-feeding and the prevalence of asthma and wheeze in children: analyses from the Third National Health and Nutrition Examination Survey, 1988-1994.

Journal Article The Journal of allergy and clinical immunology · February 2003 BackgroundAsthma prevalence has increased dramatically in recent years, especially among children. Breast-feeding might protect children against asthma and related conditions (recurrent wheeze), and this protective effect might depend on the durat ... Full text Cite

High cumulative incidence of uterine leiomyoma in black and white women: ultrasound evidence.

Journal Article American journal of obstetrics and gynecology · January 2003 ObjectiveUterine leiomyoma, or fibroid tumors, are the leading indication for hysterectomy in the United States, but the proportion of women in whom fibroid tumors develop is not known. This study screened for fibroid tumors, independently of clin ... Full text Cite

Bayesian modeling of incidence and progression of disease from cross-sectional data.

Journal Article Biometrics · December 2002 In the absence of longitudinal data, the current presence and severity of disease can be measured for a sample of individuals to investigate factors related to disease incidence and progression. In this article, Bayesian discrete-time stochastic models are ... Full text Cite

Improving risk Assessment: Research opportunities in dose response modeling to improve risk assessment

Journal Article Human and Ecological Risk Assessment · October 1, 2002 Substantial improvements in dose response modeling for risk assessment may result from recent and continuing advances in biological research, biochemical techniques, biostatistical/mathematical methods and computational power. This report provides a ranked ... Full text Cite

TwoDay Algorithm in predicting fertile time.

Journal Article Human reproduction (Oxford, England) · July 2002 Full text Cite

Deficiency of either cyclooxygenase (COX)-1 or COX-2 alters epidermal differentiation and reduces mouse skin tumorigenesis.

Journal Article Cancer research · June 2002 Nonsteroidal anti-inflammatory drugs are widely reported to inhibit carcinogenesis in humans and in rodents. These drugs are believed to act by inhibiting one or both of the known isoforms of cyclooxygenase (COX). However, COX-2, and not COX-1, is the isof ... Cite

Changes with age in the level and duration of fertility in the menstrual cycle.

Journal Article Human reproduction (Oxford, England) · May 2002 BackgroundMost analyses of age-related changes in fertility cannot separate effects due to reduced frequency of sexual intercourse from effects directly related to ageing. Information on intercourse collected daily through each menstrual cycle pro ... Full text Cite

A proportional hazards model for incidence and induced remission of disease.

Journal Article Biometrics · March 2002 To assess the protective effects of a time-varying covariate, we develop a stochastic model based on tumor biology. The model assumes that individuals have a Poisson-distributed pool of initiated clones, which progress through predetectable, detectable mor ... Full text Cite

Bayesian models for multivariate current status data with informative censoring.

Journal Article Biometrics · March 2002 Multivariate current status data, consist of indicators of whether each of several events occur by the time of a single examination. Our interest focuses on inferences about the joint distribution of the event times. Conventional methods for analysis of mu ... Full text Cite

Mutational fingerprints of aging.

Journal Article Nucleic acids research · January 2002 Using a lacZ plasmid transgenic mouse model, spectra of spontaneous point mutations were determined in brain, heart, liver, spleen and small intestine in young and old mice. While similar at a young age, the mutation spectra among these organs were signifi ... Full text Cite

Bayesian modeling of the level and duration of fertility in the menstrual cycle.

Journal Article Biometrics · December 2001 Time to pregnancy studies that identify ovulation days and collect daily intercourse data can be used to estimate the day-specific probabilities of conception given intercourse on a single day relative to ovulation. In this article, a Bayesian semiparametr ... Full text Cite

The relationship between cervical secretions and the daily probabilities of pregnancy: effectiveness of the TwoDay Algorithm.

Journal Article Human reproduction (Oxford, England) · November 2001 BackgroundThe TwoDay Algorithm is a simple method for identifying the fertile window. It classifies a day as fertile if cervical secretions are present on that day or were present on the day before. This approach may be an effective alternative to ... Full text Cite

Natural limits of pregnancy testing in relation to the expected menstrual period.

Journal Article JAMA · October 2001 ContextPregnancy test kits routinely recommend testing "as early as the first day of the missed period." However, a pregnancy cannot be detected before the blastocyst implants. Due to natural variability in the timing of ovulation, implantation do ... Full text Cite

Erratum: Topical and oral administration of the natural water-soluble antioxidant from spinach reduces the multiplicity of papillomas in the Tg.AC mouse model (Toxicology Letters (2001) 122 (33-44) PII: S0378427401003459)

Journal Article Toxicology Letters · September 15, 2001 Full text Cite

Antiretroviral therapy effects on genetic and morphologic end points in lymphocytes and sperm of men with human immunodeficiency virus infection.

Journal Article J Infect Dis · July 15, 2001 Many human immunodeficiency virus (HIV)-infected persons receive prolonged treatment with DNA-reactive antiretroviral drugs. A prospective study was conducted of 26 HIV-infected men who provided samples before treatment and at multiple times after beginnin ... Full text Link to item Cite

Commentary: practical advantages of Bayesian analysis of epidemiologic data.

Journal Article American journal of epidemiology · June 2001 In the past decade, there have been enormous advances in the use of Bayesian methodology for analysis of epidemiologic data, and there are now many practical advantages to the Bayesian approach. Bayesian models can easily accommodate unobserved variables s ... Full text Cite

A flexible parametric model for combining current status and age at first diagnosis data.

Journal Article Biometrics · June 2001 In some cross-sectional studies of chronic disease, data consist of the age at examination, whether the disease was present at the exam, and recall of the age at first diagnosis. This article describes a flexible parametric approach for combining current s ... Full text Cite

Topical and oral administration of the natural water-soluble antioxidant from spinach reduces the multiplicity of papillomas in the Tg.AC mouse model.

Journal Article Toxicology letters · May 2001 The Tg.AC mouse carrying the v-Ha-ras structural gene is a useful model for the study of chemical carcinogens, especially those acting via non-genotoxic mechanisms. This study evaluated the efficacy of the non-toxic, water-soluble antioxidant from spinach, ... Full text Cite

Likelihood of conception with a single act of intercourse: providing benchmark rates for assessment of post-coital contraceptives.

Journal Article Contraception · April 2001 Emergency post-coital contraceptives effectively reduce the risk of pregnancy, but their degree of efficacy remains uncertain. Measurement of efficacy depends on the pregnancy rate without treatment, which cannot be measured directly. We provide indirect e ... Full text Cite

Assessing human fertility using several markers of ovulation.

Journal Article Statistics in medicine · March 2001 In modelling human fertility one ideally accounts for timing of intercourse relative to ovulation. Measurement error in identifying the day of ovulation can bias estimates of fecundability parameters and attenuate estimates of covariate effects. In the abs ... Full text Cite

Modeling of changes in tumor burden

Journal Article Journal of Agricultural Biological and Environmental Statistics · March 1, 2001 Skin painting studies on transgenic mice have recently been approved by the Food and Drug Administration (FDA) for carcinogenicity testing. Data consist of serial skin tumor counts on the backs of shaved mice in each of several dose groups. Current methods ... Full text Cite

Factor analytic models of clustered multivariate data with informative censoring.

Journal Article Biometrics · March 2001 This article describes a general class of factor analytic models for the analysis of clustered multivariate data in the presence of informative missingness. We assume that there are distinct sets of cluster-level latent variables related to the primary out ... Full text Cite

Some issues in assessing human fertility

Chapter · January 1, 2001 One of the pleasures of working as an applied statistician is the awareness it brings of the wide diversity of scientific fields to which our profession contributes critical concepts and methods. My own awareness was enhanced by accepting the invitation fr ... Cite

Bayesian incidence analysis of animal tumorigenicity data

Journal Article Journal of the Royal Statistical Society Series C Applied Statistics · January 1, 2001 Statistical inference about tumorigenesis should focus on the tumour incidence rate. Unfortunately, in most animal carcinogenicity experiments, tumours are not observable in live animals and censoring of the tumour onset times is informative. In this paper ... Full text Cite

Distinguishing effects on tumor multiplicity and growth rate in chemoprevention experiments.

Journal Article Biometrics · December 2000 In some types of cancer chemoprevention experiments and short-term carcinogenicity bioassays, the data consist of the number of observed tumors per animal and the times at which these tumors were first detected. In such studies, there is interest in distin ... Full text Cite

A Bayesian Model for Fecundability and Sterility

Journal Article Journal of the American Statistical Association · December 1, 2000 There is increasing evidence that exposure to environmental toxins during key stages of development can disrupt the human reproductive system. Such effects have proven difficult to study due to the many behavioral and biological factors involved in human r ... Full text Cite

Bayesian analysis of mutational spectra.

Journal Article Genetics · November 2000 Studies that examine both the frequency of gene mutation and the pattern or spectrum of mutational changes can be used to identify chemical mutagens and to explore the molecular mechanisms of mutagenesis. In this article, we propose a Bayesian hierarchical ... Full text Cite

The timing of the "fertile window" in the menstrual cycle: day specific estimates from a prospective study.

Journal Article BMJ (Clinical research ed.) · November 2000 ObjectivesTo provide specific estimates of the likely occurrence of the six fertile days (the "fertile window") during the menstrual cycle.DesignProspective cohort study.Participants221 healthy women who were planning a pregnancy ... Full text Cite

Assessing overall risk in reproductive experiments.

Journal Article Risk analysis : an official publication of the Society for Risk Analysis · August 2000 Toxicologists are often interested in assessing the joint effect of an exposure on multiple reproductive endpoints, including early loss, fetal death, and malformation. Exposures that occur prior to mating or extremely early in development can adversely af ... Full text Cite

Statistical analysis of skin tumor data from Tg.AC mouse bioassays.

Journal Article Toxicological sciences : an official journal of the Society of Toxicology · June 2000 New strategies for identifying chemical carcinogens and assessing risk have been proposed based on the Tg.AC (zetaglobin promoted v-Ha-ras) transgenic mouse. Preliminary studies suggest that the Tg. AC mouse bioassay may be an effective means of quickly ev ... Full text Cite

Some Issues in Assessing Human Fertility

Journal Article Journal of the American Statistical Association · March 1, 2000 Full text Cite

Modeling human fertility in the presence of measurement error.

Journal Article Biometrics · March 2000 The probability of conception in a given menstrual cycle is closely related to the timing of intercourse relative to ovulation. Although commonly used markers of time of ovulation are known to be error prone, most fertility models assume the day of ovulati ... Full text Cite

Accounting for unreported and missing intercourse in human fertility studies

Journal Article Statistics in Medicine · 2000 In prospective studies of human fertility that attempt to identify days of ovulation, couples record each day whether they had intercourse. Depending on the design of the study, couples either (I) mark the dates of intercourse on a chart or (II) mark 'yes' ... Full text Cite

Bayesian latent variable models for clustered mixed outcomes

Journal Article Journal of the Royal Statistical Society Series B Statistical Methodology · January 1, 2000 A general framework is proposed for modelling clustered mixed outcomes. A mixture of generalized linear models is used to describe the joint distribution of a set of underlying variables, and an arbitrary function relates the underlying variables to the ob ... Full text Cite

Models for papilloma multiplicity and regression: Applications to transgenic mouse studies

Journal Article Journal of the Royal Statistical Society Series C Applied Statistics · January 1, 2000 In cancer studies that use transgenic or knockout mice, skin tumour counts are recorded over time to measure tumorigenicity. In these studies cancer biologists are interested in the effect of endogenous and/or exogenous factors on papilloma onset, multipli ... Full text Cite

Modeling tumor onset and multiplicity using transition models with latent variables.

Journal Article Biometrics · September 1999 We describe a method for modeling carcinogenicity from animal studies where the data consist of counts of the number of tumors present over time. The research is motivated by applications to transgenic rodent studies, which have emerged as an alternative t ... Full text Cite

Factors influencing growth and survival of the killifish, Rivulus marmoratus, held inside enclosures in mangrove swamps

Journal Article Copeia · August 2, 1999 We measured growth and survival in field enclosures of juvenile Rivulus marmoratus under a variety of biotic (effects of body mass and intraspecific density) and abiotic conditions (seasonal climatic changes, site-specific hypoxia). We also tested three di ... Full text Cite

Day-specific probabilities of clinical pregnancy based on two studies with imperfect measures of ovulation.

Journal Article Human reproduction (Oxford, England) · July 1999 Two studies have related the timing of sexual intercourse (relative to ovulation) to day-specific fecundability. The first was a study of Catholic couples practising natural family planning in London in the 1950s and 1960s and the second was of North Carol ... Full text Cite

Summarizing the motion of self-propelled cells: applications to sperm motility.

Journal Article Biometrics · June 1999 Proper characterization of the motion of spermatozoa is an important prerequisite for interpreting differences in sperm motility that might arise from exposure to toxicants. Patterns of sperm movement can be extremely complex. On the basis of an exponentia ... Full text Cite

Dose-dependent number of implants and implications in developmental toxicity.

Journal Article Biometrics · June 1998 This paper proposes a method for assessing risk in developmental toxicity studies with exposure prior to implantation. The method proposed in this paper was developed to account for a dose-dependent trend in the number of implantation sites per dam, which ... Full text Cite