Journal ArticleJournal of the American Statistical Association · May 2026
Beta regression is used routinely for continuous proportional data, but it often encounters practical issues such as a lack of robustness to misspecification of the beta distribution and sensitivity to outliers. We develop an improved class of generalized ...
Full textCite
Journal ArticleEnvironmental health perspectives · May 2026
BACKGROUND: Human exposure to complex, changing, and variably correlated mixtures of environmental chemicals has presented analytical challenges to epidemiologists and human health researchers. There has been a wide variety of recent advances in sta ...
Full textCite
Journal ArticleAnnals of Statistics · April 1, 2026
We study how the posterior contraction rate under a Gaussian process (GP) prior depends on the intrinsic dimension of the predictors and the smoothness of the regression function. An open question is whether a generic GP prior that does not incorporate kno ...
Full textCite
Journal ArticleNature ecology & evolution · March 2026
Citizen science provides large amounts of biodiversity data. Key challenges in unlocking its full potential include engaging citizens with limited species identification skills and accelerating the transition from data collection to research and monitoring ...
Full textCite
Journal ArticleBayesian Analysis · March 1, 2026
Dirichlet process mixtures are particularly sensitive to the value of the precision parameter controlling the behavior of the latent partition. Randomization of the precision through a prior distribution is a common solution, which leads to more robust inf ...
Full textCite
Journal ArticleBayesian Analysis · March 1, 2026
We propose a class of nonstationary processes to characterize space-and time-varying directional associations in point-referenced data. We are motivated by spatiotemporal modeling of air pollutants in which local wind patterns are key determinants of the p ...
Full textCite
Journal ArticleStatistics in medicine · February 2026
In including random effects to account for dependent observations, the odds ratio interpretation of logistic regression coefficients is changed from population-averaged to subject-specific. This is unappealing in many applications, motivating a rich litera ...
Full textCite
Journal ArticleBiometrika · January 1, 2026
It is often of interest to infer lower-dimensional structure underlying complex data. As a flexible class of nonlinear structures, it is common to focus on Riemannian manifolds. Most existing manifold-learning algorithms replace the original data with lowe ...
Full textCite
Journal ArticleBayesian analysis · December 2025
The beta distribution serves as a canonical tool for modeling probabilities in statistics and machine learning. However, there is limited work on flexible and computationally convenient stochastic process extensions for modeling dependent random probabilit ...
Full textCite
Journal ArticleNature methods · October 2025
DNA-based biodiversity surveys result in massive-scale data, including up to millions of species-of which, most are rare. Making the most of such data for inference and prediction requires modeling approaches that can relate species occurrences to environm ...
Full textCite
ConferenceProceedings of SPIE the International Society for Optical Engineering · September 17, 2025
Alzheimer’s disease (AD) affects over 10% of people above age 65. Current treatments remain largely ineffective, thus early biomarkers are essential for devising preventive interventions, and personalizing these based on risk profiles. Brain age gap (BAG)— ...
Full textCite
Journal ArticleAnn Appl Stat · September 2025
Sepsis is a life-threatening condition caused by a dysregulated host response to infection. Recently, researchers have hypothesized that sepsis consists of a heterogeneous spectrum of distinct subtypes, motivating several studies to identify clusters of se ...
Full textLink to itemCite
Journal ArticleJournal of the American Statistical Association · June 2025
Factor analysis provides a canonical framework for imposing lower-dimensional structure such as sparse covariance in high-dimensional data. High-dimensional data on the same set of variables are often collected under different conditions, for instance in r ...
Full textCite
Journal ArticleBioinformatics (Oxford, England) · May 2025
MotivationFeature selection is a critical task in machine learning and statistics. However, existing feature selection methods either (i) rely on parametric methods such as linear or generalized linear models, (ii) lack theoretical false discovery ...
Full textOpen AccessCite
Journal ArticleJournal of the American Statistical Association · April 2025
Likelihood-based inferences have been remarkably successful in wide-spanning application areas. However, even after due diligence in selecting a good model for the data at hand, there is inevitably some amount of model misspecification: outliers, data cont ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series B, Statistical methodology · April 2025
While there is an immense literature on Bayesian methods for clustering, the multiview case has received little attention. This problem focuses on obtaining distinct but statistically dependent clusterings in a common set of entities for different data typ ...
Full textCite
Journal ArticleMachine Learning · March 1, 2025
We propose a general scheme for solving convex and non-convex optimization problems on manifolds. The central idea is that, by adding a multiple of the squared retraction distance to the objective function in question, we “convexify” the objective function ...
Full textCite
Journal ArticleThe annals of applied statistics · March 2025
Developmental epidemiology commonly focuses on assessing the association between multiple early life exposures and childhood health. Statistical analyses of data from such studies focus on inferring the contributions of individual exposures, while also cha ...
Full textOpen AccessCite
Journal ArticleAnnals of Applied Statistics · March 1, 2025
There is abundant interest in assessing the joint effects of multiple exposures on human health. This is often referred to as the mixtures problem in environmental epidemiology and toxicology. Classically, studies have examined the adverse health effects o ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 1, 2025
Many lifeline infrastructure systems consist of thousands of components configured in a complex directed network. Disruption of the infrastructure constitutes a recurrent failure process over a directed network. Statistical inference for such network recur ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2025
Bayesian clustering typically relies on mixture models, with each component interpreted as a different cluster. After defining a prior for the component parameters and weights, Markov chain Monte Carlo (MCMC) algorithms are commonly used to produce samples ...
Full textCite
Journal ArticleBiometrika · January 2025
Generalized linear models are routinely used for modelling relationships between a response variable and a set of covariates. The simple form of a generalized linear model comes with easy interpretability, but also leads to concerns about model misspecific ...
Full textCite
Journal ArticleBiometrika · January 1, 2025
Tree graphs are used routinely in statistics. When estimating a Bayesian model with a tree component, sampling the posterior remains a core difficulty. Existing Markov chain Monte Carlo methods tend to rely on local moves, often leading to poor mixing. A p ...
Full textCite
Journal ArticleIEEE transactions on signal processing : a publication of the IEEE Signal Processing Society · January 2025
When there is a distributional shift between data used to train a predictive algorithm and current data, performance can suffer. This is known as the domain adaptation problem. Bootstrap aggregating, or bagging, is a popular method for improving the stabil ...
Full textCite
Journal ArticleBiometrika · January 1, 2025
Joint species distribution models are popular in ecology for modelling covariate effects on species occurrence, while characterizing cross-species dependence. Data consist of multivariate binary indicators of the occurrences of different species in each sa ...
Full textCite
Journal ArticleBiometrika · January 1, 2025
This article focuses on inference in logistic regression for high-dimensional binary outcomes. A popular approach induces dependence across the outcomes by including latent factors in the linear predictor. Bayesian approaches are useful for characterizing ...
Full textCite
Journal ArticleBiometrika · December 2024
In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable O(n) computational complexity. In these models, data at each location are typically assumed conditionally d ...
Full textCite
Journal ArticleMagn Reson Imaging · December 2024
Alzheimer's disease (AD) presents complex challenges due to its multifactorial nature, poorly understood etiology, and late detection. The mechanisms through which genetic and modifiable risk factors influence disease susceptibility are under intense inves ...
Full textLink to itemCite
Journal ArticleScientific reports · December 2024
The article is motivated by an application to the EarlyBird cohort study aiming to explore how anthropometrics and clinical and metabolic processes are associated with obesity and glucose control during childhood. There is interest in inferring the relatio ...
Full textCite
Journal ArticleThe annals of applied statistics · June 2024
In this paper we predict sea surface salinity (SSS) in the Arctic Ocean based on satellite measurements. SSS is a crucial indicator for ongoing changes in the Arctic Ocean and can offer important insights about climate change. We particularly focus on area ...
Full textCite
Journal ArticleStatistics in medicine · May 2024
Throughout the course of an epidemic, the rate at which disease spreads varies with behavioral changes, the emergence of new disease variants, and the introduction of mitigation policies. Estimating such changes in transmission rates can help us better mod ...
Full textCite
Journal ArticleJournal of machine learning research : JMLR · March 2024
Quantifying spatial and/or temporal associations in multivariate geolocated data of different types is achievable via spatial random effects in a Bayesian hierarchical model, but severe computational bottlenecks arise when spatial dependence is encoded as ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2024
It has become increasingly common to collect high-dimensional binary response data; for example, with the emergence of new sampling techniques in ecology. In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences. However, a ...
Full textCite
Journal ArticleIEEE transactions on signal processing : a publication of the IEEE Signal Processing Society · January 2024
We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solu ...
Full textOpen AccessCite
Journal ArticleJournal of Nonparametric Statistics · January 1, 2024
In a variety of application areas, there is interest in assessing evidence of differences in the intensity of event realizations between groups. For example, in cancer genomic studies collecting data on rare variants, the focus is on assessing whether and ...
Full textCite
Journal ArticleStatistical Science · January 1, 2024
Bayesian models are powerful tools for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty through the posterior d ...
Full textCite
ConferenceProceedings of Machine Learning Research · January 1, 2024
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metr ...
Cite
Journal ArticleSIAM Asa Journal on Uncertainty Quantification · January 1, 2024
In many areas of science and engineering, computer simulations are widely used as proxies for physical experiments, which can be infeasible or unethical. Such simulations are often computationally expensive, and an emulator can be trained to efficiently pr ...
Full textCite
Journal ArticleImaging neuroscience (Cambridge, Mass.) · January 2024
Mapping of human brain structural connectomes via diffusion magnetic resonance imaging (dMRI) offers a unique opportunity to understand brain structural connectivity and relate it to various human traits, such as cognition. However, head displacement durin ...
Full textCite
Journal ArticleBayesian Analysis · January 1, 2024
Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models often rely on the forward-backward sampler. This makes them computationally slow as the length of the time series increases, motivating the development of sub-sampling-based approaches. Th ...
Full textCite
Journal ArticleBiometrics · December 2023
The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemi ...
Full textCite
Journal ArticleBiometrika · September 2023
Loss-based clustering methods, such as k-means clustering and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixtu ...
Full textCite
Journal ArticleApplied and Computational Harmonic Analysis · September 1, 2023
This paper proposes a novel kernel-based optimization scheme to handle tasks in the analysis, e.g., signal spectral estimation and single-channel source separation of 1D non-stationary oscillatory data. The key insight of our optimization scheme for recons ...
Full textCite
Journal ArticleJ R Stat Soc Ser C Appl Stat · August 2023
Targeted brain stimulation has the potential to treat mental illnesses. We develop an approach to help design protocols by identifying relevant multi-region electrical dynamics. Our approach models these dynamics as a superposition of latent networks, wher ...
Full textLink to itemCite
Journal ArticleNeuroImage · August 2023
Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome a ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series C, Applied statistics · May 2023
We aim to infer bioactivity of each chemical by assay endpoint combination, addressing sparsity of toxicology data. We propose a Bayesian hierarchical framework which borrows information across different chemicals and assay endpoints, facilitates out-of-sa ...
Full textCite
Journal ArticleCereb Cortex · April 25, 2023
The selective vulnerability of brain networks in individuals at risk for Alzheimer's disease (AD) may help differentiate pathological from normal aging at asymptomatic stages, allowing the implementation of more effective interventions. We used a sample of ...
Full textOpen AccessLink to itemCite
Journal ArticleJournal of the Royal Statistical Society Series B Statistical Methodology · April 1, 2023
High-dimensional categorical data are routinely collected in biomedical and social sciences. It is of great importance to build interpretable parsimonious models that perform dimension reduction and uncover meaningful latent structures from such discrete d ...
Full textCite
Journal ArticleJournal of machine learning research : JMLR · April 2023
Bayesian mixture models are widely used for clustering of high-dimensional data with appropriate uncertainty quantification. However, as the dimension of the observations increases, posterior inference often tends to favor too many or too few clusters. Thi ...
Full textCite
Journal ArticleMethods in Ecology and Evolution · February 1, 2023
Predicting the taxonomic affiliation of DNA sequences collected from biological samples is a fundamental step in biodiversity assessment. This task is performed by leveraging existing databases containing reference DNA sequences endowed with a taxonomic id ...
Full textCite
Journal ArticleJournal of machine learning research : JMLR · February 2023
Mixed Membership Models (MMMs) are a popular family of latent structure models for complex multivariate data. Instead of forcing each subject to belong to a single cluster, MMMs incorporate a vector of subject-specific weights characterizing partial member ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 1, 2023
Classification algorithms face difficulties when one or more classes have limited training data. We are particularly interested in classification trees, due to their interpretability and flexibility. When data are limited in one or more of the classes, the ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2023
We aim at modeling the appearance of distinct tags in a sequence of labeled objects. Common examples of this type of data include words in a corpus or distinct species in a sample. These sequential discoveries are often summarized via accumulation curves, ...
Full textCite
Journal ArticlePLoS One · 2023
Given a large clinical database of longitudinal patient information including many covariates, it is computationally prohibitive to consider all types of interdependence between patient variables of interest. This challenge motivates the use of mutual info ...
Full textLink to itemCite
Journal ArticleJournal of the American Statistical Association · January 1, 2023
Reductions in natural habitats urge that we better understand species’ interconnection and how biological communities respond to environmental changes. However, ecological studies of species’ interactions are limited by their geographic and taxonomic focus ...
Full textCite
Journal ArticleBayesian Analysis · January 1, 2023
It is common to be interested in rankings or order relationships among entities. In complex settings where one does not directly measure a univariate statistic upon which to base ranks, such inferences typically rely on statistical models having entity-spe ...
Full textCite
Journal ArticleBayesian Analysis · January 1, 2023
An intriguing new class of piecewise deterministic Markov processes (PDMPs) has recently been proposed as an alternative to Markov chain Monte Carlo (MCMC). We propose a new class of PDMPs termed Gibbs zig-zag samplers, which allow parameters to be updated ...
Full textCite
Journal ArticleFrontiers in neuroscience · January 2023
The brain structural connectome is generated by a collection of white matter fiber bundles constructed from diffusion weighted MRI (dMRI), acting as highways for neural activity. There has been abundant interest in studying how the structural connectome va ...
Full textCite
Journal ArticleJournal of Machine Learning Research · January 1, 2023
There is a rich literature on Bayesian methods for density estimation, which characterize the unknown density as a mixture of kernels. Such methods have advantages in terms of providing uncertainty quantification in estimation, while being adaptive to a ri ...
Cite
Journal ArticleJournal of Machine Learning Research · January 1, 2023
In multivariate data analysis, it is often important to estimate a graph characterizing dependence among p variables. A popular strategy in Gaussian graphical models and latent Gaussian graphical models uses the non-zero entries in a p × p covariance or pr ...
Cite
Journal ArticleThe annals of applied statistics · December 2022
Reliably learning group structures among nodes in network data is challenging in several applications. We are particularly motivated by studying covert networks that encode relationships among criminals. These data are subject to measurement errors, and ex ...
Full textCite
Journal ArticleBiometrika · September 2022
Factorization models express a statistical object of interest in terms of a collection of simpler objects. For example, a matrix or tensor can be expressed as a sum of rank-one components. However, in practice, it can be challenging to infer the relative i ...
Full textCite
Journal ArticleThe annals of applied statistics · September 2022
We introduce a new class of semiparametric latent variable models for long memory discretized event data. The proposed methodology is motivated by a study of bird vocalizations in the Amazon rain forest; the timings of vocalizations exhibit self-similarity ...
Full textCite
Journal ArticleJournal of mathematical biology · September 2022
The Susceptible-Infectious-Recovered (SIR) equations and their extensions comprise a commonly utilized set of models for understanding and predicting the course of an epidemic. In practice, it is of substantial interest to estimate the model parameters bas ...
Full textOpen AccessCite
Journal ArticleBioinformatics (Oxford, England) · August 2022
MotivationIt has become routine in neuroscience studies to measure brain networks for different individuals using neuroimaging. These networks are typically expressed as adjacency matrices, with each cell containing a summary of connectivity betwe ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society Series C Applied Statistics · June 1, 2022
This article focuses on the problem of predicting a response variable based on a network-valued predictor. Our motivation is the development of interpretable and accurate predictive models for cognitive traits and neuro-psychiatric disorders based on an in ...
Full textCite
Journal ArticleThe annals of applied statistics · June 2022
Psychiatric studies of suicide provide fundamental insights on the evolution of severe psychopathologies, and contribute to the development of early treatment interventions. Our focus is on modelling different traits of psychosis and their interconnections ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society Series B Statistical Methodology · April 1, 2022
In nonparametric regression, it is common for the inputs to fall in a restricted subset of Euclidean space. Typical kernel-based methods that do not take into account the intrinsic geometry of the domain across which observations are collected may produce ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society Series A Statistics in Society · April 1, 2022
Risk assessment instruments are used across the criminal justice system to estimate the probability of some future event, such as failure to appear for a court appointment or re-arrest. The estimated probabilities are then used in making decisions at the i ...
Full textCite
Journal ArticleBiometrika · March 1, 2022
In the main paper under subsection -3.2. Bayesian variable selection-, all references to -5.2- should read: -3.1-. Under subsection -5.2. Bayesian variable selection-, the reference to -5.3 and 6- should read: -S5.3 and S6-. These errors have now been corr ...
Full textCite
Journal ArticleThe annals of applied statistics · March 2022
Characterizing the shared memberships of individuals in a classification scheme poses severe interpretability issues, even when using a moderate number of classes (say 4). Mixed membership models quantify this phenomenon, but they typically focus on goodne ...
Full textCite
Journal ArticleInternational journal of environmental research and public health · January 2022
Humans are exposed to a diverse mixture of chemical and non-chemical exposures across their lifetimes. Well-designed epidemiology studies as well as sophisticated exposure science and related technologies enable the investigation of the health impacts of m ...
Full textCite
Journal ArticleJournal of machine learning research : JMLR · January 2022
High resolution geospatial data are challenging because standard geostatistical models based on Gaussian processes are known to not scale to large data sizes. While progress has been made towards methods that can be computed more efficiently, considerably ...
Full textCite
Journal ArticleSIAM Journal on Scientific Computing · January 1, 2022
Subspace-valued functions arise in a wide range of problems, including parametric reduced order modeling (PROM), parameter reduction, and subspace tracking. In PROM, each parameter point can be associated with a subspace, which is used for Petrov–Galerkin ...
Full textCite
Journal ArticleFront Neurosci · 2022
Spatial navigation and orientation are emerging as promising markers for altered cognition in prodromal Alzheimer's disease, and even in cognitively normal individuals at risk for Alzheimer's disease. The different APOE gene alleles confer various degrees ...
Full textLink to itemCite
Journal ArticleSIAM Journal on Mathematics of Data Science · January 1, 2022
In many applications, the curvature of the space supporting the data makes the statistical modeling challenging. In this paper we discuss the construction and use of probability distributions wrapped around manifolds using exponential maps. These distribut ...
Full textCite
Journal ArticleNeuroImage · December 2021
There has been a huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationships with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes correspo ...
Full textCite
Journal ArticleApplied and Computational Harmonic Analysis · November 1, 2021
In the manifold setting, we provide a series of spectral convergence results quantifying how the eigenvectors and eigenvalues of the graph Laplacian converge to the eigenfunctions and eigenvalues of the Laplace-Beltrami operator in the L∞ sense. ...
Full textCite
Journal ArticleThe annals of applied statistics · September 2021
In this article we investigate group differences in phthalate exposure profiles using NHANES data. Phthalates are a family of industrial chemicals used in plastics and as solvents. There is increasing evidence of adverse health effects of exposure to phtha ...
Full textCite
Journal ArticleThe annals of applied statistics · September 2021
Today there are approximately 85,000 chemicals regulated under the Toxic Substances Control Act, with around 2,000 new chemicals introduced each year. It is impossible to screen all of these chemicals for potential toxic effects, either via full organism < ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series A, (Statistics in Society) · July 2021
In many application areas, predictive models are used to support or make important decisions. There is increasing awareness that these models may contain spurious or otherwise undesirable correlations. Such correlations may arise from a variety of sources, ...
Full textCite
Journal ArticleJ R Stat Soc Ser C Appl Stat · June 2021
In low-resource settings where vital registration of death is not routine it is often of critical interest to determine and study the cause of death (COD) for individuals and the cause-specific mortality fraction (CSMF) for populations. Post-mortem autopsi ...
Full textLink to itemCite
Journal ArticleBiometrika · June 2021
Posterior computation for high-dimensional data with many parameters can be challenging. This article focuses on a new method for approximating posterior distributions of a low- to moderate-dimensional parameter in the presence of a high-dimensional or oth ...
Full textCite
Journal ArticleBayesian analysis · March 2021
There is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions. Most approaches assume exchangeability, leading to simple representations in terms of Exchangeable Partition Probabil ...
Full textCite
Journal ArticleJournal of Machine Learning Research · January 1, 2021
Statistical methods relating tensor predictors to scalar outcomes in a regression model generally vectorize the tensor predictor and estimate the coefficients of its entries employing some form of regularization, use summaries of the tensor covariate, or u ...
Cite
Journal ArticleJournal of the American Statistical Association · January 2021
This article is motivated by the problem of inference on interactions among chemical exposures impacting human health outcomes. Chemicals often co-occur in the environment or in synthetic mixtures and as a result exposure levels can be highly correlated. W ...
Full textCite
Journal ArticleJournal of Computational and Graphical Statistics · January 1, 2021
Motivated by applications to Bayesian inference for statistical models with orthogonal matrix parameters, we present (Formula presented.) a general approach to Monte Carlo simulation from probability distributions on the Stiefel manifold. To bypass many of ...
Full textCite
Journal ArticleElectronic Journal of Statistics · January 1, 2021
Hypothesis testing of structure in covariance matrices is of sig-nificant importance, but faces great challenges in high-dimensional settings. Although consistent frequentist one-sample covariance tests have been pro-posed, there is a lack of simple, compu ...
Full textCite
Journal ArticleJournal of machine learning research : JMLR · January 2021
Model-based clustering is widely used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on prop ...
Cite
Journal ArticleJournal of machine learning research : JMLR · January 2021
Many modern data sets require inference methods that can estimate the shared and individual-specific components of variability in collections of matrices that change over time. Promising methods have been developed to analyze these types of data in static ...
Full textCite
ConferenceProceedings of Machine Learning Research · January 1, 2021
Transformation-based methods have been an attractive approach in non-parametric inference for problems such as unconditional and conditional density estimation due to their unique hierarchical structure that models the data as flexible transformation of a ...
Cite
Journal ArticleBiometrika · December 2020
Classification with high-dimensional data is of widespread interest and often involves dealing with imbalanced data. Bayesian classification approaches are hampered by the fact that current Markov chain Monte Carlo algorithms for posterior computation beco ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series B, Statistical methodology · December 2020
Current tools for multivariate density estimation struggle when the density is concentrated near a non-linear subspace or manifold. Most approaches require the choice of a kernel, with the multivariate Gaussian kernel by far the most commonly used. Althoug ...
Full textCite
Journal ArticleJournal of machine learning research : JMLR · December 2020
Although multivariate count data are routinely collected in many application areas, there is surprisingly little work developing flexible models for characterizing their dependence structure. This is particularly true when interest focuses on inferring the ...
Cite
Journal ArticleThe annals of applied statistics · December 2020
This article is motivated by the problem of studying the joint effect of different chemical exposures on human health outcomes. This is essentially a nonparametric regression problem, with interest being focused not on a black box for prediction but instea ...
Full textCite
Journal ArticleBiometrika · September 2020
The dimension of the parameter space is typically unknown in a variety of models that rely on factorizations. For example, in factor analysis the number of latent factors is not known and has to be inferred from the data. Although classical shrinkage prior ...
Full textCite
Journal ArticleJournal of Machine Learning Research · July 1, 2020
Closed surfaces provide a useful model for 3-d shapes, with the data typically consisting of a cloud of points in R3. The existing literature on closed surface modeling focuses on frequentist point estimation methods that join surface patches along the edg ...
Cite
Journal ArticleBiometrika · June 1, 2020
Hamiltonian Monte Carlo has emerged as a standard tool for posterior computation. In this article we present an extension that can efficiently explore target distributions with discontinuous densities. Our extension in particular enables efficient sampling ...
Full textCite
Journal ArticleBioinformatics (Oxford, England) · June 2020
MotivationLow-dimensional representations of high-dimensional data are routinely employed in biomedical research to visualize, interpret and communicate results from different pipelines. In this article, we propose a novel procedure to directly es ...
Full textCite
Journal ArticleBiometrika · March 2020
Prior information often takes the form of parameter constraints. Bayesian methods include such information through prior distributions having constrained support. By using posterior sampling algorithms, one can quantify uncertainty without relying on asymp ...
Full textCite
Journal ArticleBiometrika · March 1, 2020
In a 1970 Biometrika paper, W. K. Hastings developed a broad class of Markov chain algorithms for sampling from probability distributions that are difficult to sample from directly. The algorithm draws a candidate value from a proposal distribution and acc ...
Full textCite
Journal ArticleEcology · February 2020
The ongoing global change and the increased interest in macroecological processes call for the analysis of spatially extensive data on species communities to understand and forecast distributional changes of biodiversity. Recently developed joint species d ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2020
We propose a new approach for assigning weights to models using a divergence-based method (D-probabilities), relying on evaluating parametric models relative to a nonparametric Bayesian reference using Kullback-Leibler divergence. D-probabilities ar ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 1, 2020
We consider the problem of computationally efficient prediction with high dimensional and highly correlated predictors when accurate variable selection is effectively impossible. Direct application of penalization or Bayesian methods implemented with Marko ...
Full textCite
Journal ArticleBernoulli · January 1, 2020
Random orthogonal matrices play an important role in probability and statistics, arising in multivariate analysis, directional statistics, and models of physical systems, among other areas. Calculations involving random orthogonal matrices are complicated ...
Full textCite
Journal ArticleBayesian Analysis · January 1, 2020
Hamiltonian Monte Carlo (HMC) and related algorithms have become routinely used in Bayesian computation. In this article, we present a simple and provably accurate method to improve the efficiency of HMC and related algorithms with essentially no extra com ...
Full textCite
Conference37th International Conference on Machine Learning Icml 2020 · January 1, 2020
We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network. Existing regularization methods often focus on penalizing weights in a global/uniform manner that ignore ...
Cite
ConferenceProceedings of Machine Learning Research · January 1, 2020
We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network. Existing regularization methods often focus on penalizing weights in a global/uniform manner that ignore ...
Cite
Journal ArticleBayesian analysis · December 2019
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study t ...
Full textCite
ConferenceProceedings 18th IEEE International Conference on Machine Learning and Applications Icmla 2019 · December 1, 2019
Kernel mixture models are routinely used for density estimation. However, in multivariate settings, issues arise in efficiently approximating lower-dimensional structure in the data. For example, it is common to suppose that the density is concentrated nea ...
Full textCite
Journal ArticleBayesian Analysis · December 1, 2019
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study t ...
Full textCite
Journal ArticleBlood · November 7, 2019
Burkitt lymphoma (BL) is an aggressive, MYC-driven lymphoma comprising 3 distinct clinical subtypes: sporadic BLs that occur worldwide, endemic BLs that occur predominantly in sub-Saharan Africa, and immunodeficiency-associated BLs that occur primarily in ...
Full textLink to itemCite
Journal ArticleInformation and Inference · September 19, 2019
We study Bayesian procedures for sparse linear regression when the unknown error distribution is endowed with a non-parametric prior. Specifically, we put a symmetrized Dirichlet process mixture of Gaussian prior on the error density, where the mixing dist ...
Full textCite
Journal ArticleNeuroImage · August 2019
Advanced brain imaging techniques make it possible to measure individuals' structural connectomes in large cohort studies non-invasively. Given the availability of large scale data sets, it is extremely interesting and important to build a set of advanced ...
Full textCite
Journal ArticleEcological Monographs · August 1, 2019
A large array of species distribution model (SDM) approaches has been developed for explaining and predicting the occurrences of individual species or species assemblages. Given the wealth of existing models, it is unclear which models perform best for int ...
Full textCite
Journal ArticleJournal of the American Statistical Association · July 3, 2019
Many modern applications collect highly imbalanced categorical data, with some categories relatively rare. Bayesian hierarchical models combat data sparsity by borrowing information, while also quantifying uncertainty. However, posterior computation presen ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society Series B Statistical Methodology · July 1, 2019
We propose a class of intrinsic Gaussian processes (GPs) for interpolation, regression and classification on manifolds with a primary focus on complex constrained domains or irregularly shaped spaces arising as subsets or submanifolds of R, R2, ...
Full textCite
Journal ArticleIEEE transactions on signal processing : a publication of the IEEE Signal Processing Society · April 2019
There is an increasing interest in learning a set of small outcome-relevant subgraphs in network-predictor regression. The extracted signal subgraphs can greatly improve the interpretation of the association between the network predictor and the response. ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2019
The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure. W ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2019
In studying structural inter-connections in the human brain, it is common to first estimate fiber bundles connecting different regions relying on diffusion MRI. These fiber bundles act as highways for neural activity. Current statistical methods reduce the ...
Full textCite
Journal ArticleBernoulli · January 1, 2019
Asymptotic theory of tail index estimation has been studied extensively in the frequentist literature on extreme values, but rarely in the Bayesian context. We investigate whether popular Bayesian kernel mixture models are able to support heavy tailed dist ...
Full textCite
Journal ArticleBayesian Analysis · January 1, 2019
Gaussian processes (GPs) are very widely used for modeling of unknown functions or surfaces in applications ranging from regression to classification to spatial processes. Although there is an increasingly vast literature on applications, methods, theory a ...
Full textCite
Journal ArticleFront Neuroinform · 2019
The major genetic risk for late onset Alzheimer's disease has been associated with the presence of APOE4 alleles. However, the impact of different APOE alleles on the brain aging trajectory, and how they interact with the brain local environment in a sex s ...
Full textLink to itemCite
Journal ArticleBiometrics · December 2018
There is wide interest in studying how the distribution of a continuous response changes with a predictor. We are motivated by environmental applications in which the predictor is the dose of an exposure and the response is a health outcome. A main focus i ...
Full textCite
Journal ArticleJournal of the American Statistical Association · October 2, 2018
Studying the neurological, genetic, and evolutionary basis of human vocal communication mechanisms using animal vocalization models is an important field of neuroscience. The datasets typically comprise structured sequences of syllables or “songs” produced ...
Full textCite
Journal ArticleJournal of Machine Learning Research · October 1, 2018
There has been considerable interest in making Bayesian inference more scalable. In big data settings, most of the focus has been on reducing the computing time per iteration rather than reducing the number of iterations needed in Markov chain Monte Carlo ...
Cite
Journal ArticleJournal of Machine Learning Research · August 1, 2018
Divide-and-conquer based methods for Bayesian inference provide a general approach for tractable posterior inference when the sample size is large. These methods divide the data into smaller subsets, sample from the posterior distribution of parameters in ...
Cite
Journal ArticleJournal of Computational and Graphical Statistics · July 3, 2018
We propose a conditional density filtering (C-DF) algorithm for efficient online Bayesian inference. C-DF adapts MCMC sampling to the online setting, sampling from approximations to conditional posterior distributions obtained by propagating surrogate cond ...
Full textCite
Journal ArticleBioinformatics (Oxford, England) · July 2018
MotivationAlthough there is a rich literature on methods for assessing the impact of functional predictors, the focus has been on approaches for dimension reduction that do not suit certain applications. Examples of standard approaches include fun ...
Full textCite
Journal ArticleSci Rep · June 22, 2018
High-throughput screening of compounds (chemicals) is an essential part of drug discovery, involving thousands to millions of compounds, with the purpose of identifying candidate hits. Most statistical tools, including the industry standard B-score method, ...
Full textLink to itemCite
Journal ArticleBiometrika · June 2018
There has been substantial recent interest in record linkage, where one attempts to group the records pertaining to the same entities from one or more large databases that lack unique identifiers. This can be viewed as a type of microclustering, with few o ...
Full textCite
Journal ArticleNeuroImage · May 2018
Advances in understanding the structural connectomes of human brain require improved approaches for the construction, comparison and integration of high-dimensional whole-brain tractography data from a large number of individuals. This article develops a p ...
Full textCite
Journal ArticleStatistics and Probability Letters · May 1, 2018
There is vast interest in automated methods for complex data analysis. However, there is a lack of consideration of (1) interpretability, (2) uncertainty quantification, (3) applications with limited training data, and (4) selection bias. Statistical metho ...
Full textCite
ConferenceDiabetes Care · April 2018
OBJECTIVE: Hemoglobin A1c (A1C) is used in assessment of patients for elective surgeries because hyperglycemia increases risk of adverse events. However, the interplay of A1C, glucose, and surgical outcomes remains unclarified, with often only two of these ...
Full textLink to itemCite
Journal ArticleBayesian Analysis · January 1, 2018
Network data are increasingly collected along with other variables of interest. Our motivation is drawn from neurophysiology studies measuring brain connectivity networks for a sample of individuals along with their membership to a low or high creative rea ...
Full textCite
Journal ArticlePloS one · January 2018
Understanding how groups of neurons interact within a network is a fundamental question in system neuroscience. Instead of passively observing the ongoing activity of a network, we can typically perturb its activity, either by external sensory stimulation ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2018
We develop a generalized method of moments (GMM) approach for fast parameter estimation in a new class of Dirichlet latent variable models with mixed data types. Parameter estimation via GMM has computational and statistical advantages over alternative met ...
Full textCite
Journal ArticleBayesian Analysis · January 1, 2018
The supplementary materials contain proofs of Propositions 1, 2 and 3, providing the-oretical support for the methodology developed in the article “Bayesian Inference and Testing of Group Differences in Brain Networks ...
Full textCite
Journal ArticleJournal of Machine Learning Research · December 1, 2017
We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the poste ...
Cite
Journal ArticleBiometrika · December 1, 2017
We consider shape-restricted nonparametric regression on a closed set $$\mathcal{X} \subset \mathbb{R},$$ where it is reasonable to assume that the function has no more than $$H$$ local extrema interior to $$\mathcal{X}$$. Following a Bayesian approach we ...
Full textLink to itemCite
Journal ArticleOperations Research · November 1, 2017
In cargo logistics, a key performance measure is transport risk, defined as the deviation of the actual arrival time from the planned arrival time. Neither earliness nor tardiness is desirable for customer and freight forwarders. In this paper, we investig ...
Full textCite
Journal ArticleCell · October 5, 2017
Diffuse large B cell lymphoma (DLBCL) is the most common form of blood cancer and is characterized by a striking degree of genetic and clinical heterogeneity. This heterogeneity poses a major barrier to understanding the genetic basis of the disease and it ...
Full textLink to itemCite
Journal ArticleJournal of the American Statistical Association · October 2, 2017
Replicated network data are increasingly available in many research fields. For example, in connectomic applications, interconnections among brain regions are collected for each patient under study, motivating statistical models which can flexibly characte ...
Full textOpen AccessCite
Journal ArticleBiometrics · September 2017
High-throughput genetic and epigenetic data are often screened for associations with an observed phenotype. For example, one may wish to test hundreds of thousands of genetic variants, or DNA methylation sites, for an association with disease status. These ...
Full textCite
Journal ArticleBiometrika · September 1, 2017
Standard posterior sampling algorithms, such as Markov chain Monte Carlo procedures, face major challenges in scaling up to massive datasets. We propose a simple and general posterior interval estimation algorithm to rapidly and accurately estimate quantil ...
Full textCite
Journal ArticleBiometrika · September 2017
Bayesian sparse factor models have proven useful for characterizing dependence in multivariate data, but scaling computation to large numbers of samples and dimensions is problematic. We propose expandable factor analysis for scalable inference in factor m ...
Full textCite
Journal ArticleJournal of Machine Learning Research · August 1, 2017
We propose a Bayesian approach to regression with a scalar response on vector and tensor covariates. Vectorization of the tensor prior to analysis fails to exploit the structure, often leading to poor estimation and predictive performance. We introduce a n ...
Cite
Journal ArticleBioinformatics (Oxford, England) · June 2017
MotivationThere is increasing interest in learning how human brain networks vary as a function of a continuous trait, but flexible and efficient procedures to accomplish this goal are limited. We develop a Bayesian semiparametric model, which comb ...
Full textOpen AccessCite
Journal ArticleBrain Behav · June 2017
INTRODUCTION: It is unknown how the brain coordinates decisions to withstand personal costs in order to prevent other individuals' distress. Here we test whether local field potential (LFP) oscillations between brain regions create "neural contexts" that s ...
Full textOpen AccessLink to itemCite
Journal ArticleBayesian Analysis · June 1, 2017
Although there are many methods for functional data analysis, less emphasis is put on characterizing variability among volatilities of individual functions. In particular, certain individuals exhibit erratic swings in their trajectory while other individua ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society Series B Statistical Methodology · June 1, 2017
In many applications involving point pattern data, the Poisson process assumption is unrealistic, with the data exhibiting a more regular spread. Such repulsion between events is exhibited by trees for example, because of competition for light and nutrient ...
Full textCite
Journal ArticleJ Exp Med · May 1, 2017
Enteropathy-associated T cell lymphoma (EATL) is a lethal, and the most common, neoplastic complication of celiac disease. Here, we defined the genetic landscape of EATL through whole-exome sequencing of 69 EATL tumors. SETD2 was the most frequently silenc ...
Full textLink to itemCite
Journal ArticleProceedings. Biological sciences · May 2017
Estimation of intra- and interspecific interactions from time-series on species-rich communities is challenging due to the high number of potentially interacting species pairs. The previously proposed sparse interactions model overcomes this challenge by a ...
Full textCite
Journal ArticleEcology letters · May 2017
Community ecology aims to understand what factors determine the assembly and dynamics of species assemblages at different spatiotemporal scales. To facilitate the integration between conceptual and statistical approaches in community ecology, we propose Hi ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society Series C Applied Statistics · April 1, 2017
Complex network data problems are increasingly common in many fields of application. Our motivation is drawn from strategic marketing studies monitoring customer choices of specific products, along with co-subscription networks encoding multiple-purchasing ...
Full textCite
Journal ArticleCancer Discov · April 2017
Hepatosplenic T-cell lymphoma (HSTL) is a rare and lethal lymphoma; the genetic drivers of this disease are unknown. Through whole-exome sequencing of 68 HSTLs, we define recurrently mutated driver genes and copy-number alterations in the disease. Chromati ...
Full textLink to itemCite
Journal ArticleStatistica Sinica · April 1, 2017
The Stiefel manifold Vp,d is the space of all d × p orthonormal matrices, with the d-1 hypersphere and the space of all orthogonal matrices constituting special cases. In modeling data lying on the Stiefel manifold, parametric distributions such as the mat ...
Full textCite
Journal ArticleMethods in Ecology and Evolution · April 1, 2017
Joint species distribution models (JSDM) are increasingly used to analyse community ecology data. Recent progress with JSDMs has provided ecologists with new tools for estimating species associations (residual co-occurrence patterns after accounting for en ...
Full textCite
Journal ArticleOikos · February 1, 2017
Research on mutualistic and antagonistic networks, such as plant–pollinator and host–parasite networks, has shown that species interactions can influence and be influenced by the responses of species to environmental perturbations. Here we examine whether ...
Full textCite
Journal ArticleAnnals of statistics · January 2017
Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categoric ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2017
We propose an extrinsic regression framework for modeling data with manifold valued responses and Euclidean predictors. Regression with manifold responses has wide applications in shape analysis, neuroscience, medical imaging and many other areas. Our appr ...
Full textCite
Journal ArticleStochastic Processes and their Applications · December 1, 2016
Two-component mixture priors provide a traditional way to induce sparsity in high-dimensional Bayes models. However, several aspects of such a prior, including computational complexities in high-dimensions, interpretation of exact zeros and non-sparse post ...
Full textCite
Journal ArticleAnnals of Applied Statistics · December 1, 2016
Our focus is on realistically modeling and forecasting dynamic networks of face-to-face contacts among individuals. Important aspects of such data that lead to problems with current methods include the tendency of the contacts to move between periods of sl ...
Full textCite
Journal ArticleBiometrika · December 2016
There is growing interest in analysing high-dimensional count data, which often exhibit quasi-sparsity corresponding to an overabundance of zeros and small nonzero counts. Existing methods for analysing multivariate count data via Poisson or negative binom ...
Full textCite
Journal ArticleJournal of Machine Learning Research · October 1, 2016
Graphical models express conditional independence relationships among variables. Although methods for vector-valued data are well established, functional data graphical models remain underdeveloped. By functional data, we refer to data that are realization ...
Cite
Journal ArticleJournal of the American Statistical Association · October 1, 2016
We consider the problem of flexible modeling of higher order Markov chains when an upper bound on the order of the chain is known but the true order and nature of the serial dependence are unknown. We propose Bayesian nonparametric methodology based on con ...
Full textCite
Journal ArticleSIAM Journal on Imaging Sciences · August 30, 2016
We propose an algorithm that removes the visually unpleasant effects of cradling in X-ray images of panel paintings, with the goal of improving the X-ray image readability by art experts. The algorithm consists of three stages. In the first stage the locat ...
Full textCite
Journal ArticleThe European journal of contraception & reproductive health care : the official journal of the European Society of Contraception · August 2016
ObjectivesWe propose a new, personalised approach of estimating a woman's most fertile days that only requires recording the first day of menses and can use a smartphone to convey this information to the user so that she can plan or prevent pregna ...
Full textCite
Journal ArticleNeuron · July 20, 2016
Circuits distributed across cortico-limbic brain regions compose the networks that mediate emotional behavior. The prefrontal cortex (PFC) regulates ultraslow (<1 Hz) dynamics across these networks, and PFC dysfunction is implicated in stress-related illne ...
Full textLink to itemCite
Journal ArticleStatistica Sinica · July 1, 2016
Our focus is on constructing a multiscale nonparametric prior for densities. The Bayes density estimation literature is dominated by single scale methods, with the exception of Polya trees, which favor overly-spiky densities even when the truth is smooth. ...
Full textCite
Journal ArticleStatistics & probability letters · June 2016
In population studies, it is standard to sample data via designs in which the population is divided into strata, with the different strata assigned different probabilities of inclusion. Although there have been some proposals for including sample survey we ...
Full textCite
Journal ArticleBiometrika · June 2016
We present a data augmentation scheme to perform Markov chain Monte Carlo inference for models where data generation involves a rejection sampling algorithm. Our idea is a simple scheme to instantiate the rejected proposals preceding each data point. The r ...
Full textOpen AccessCite
Journal ArticleJournal of Machine Learning Research · May 1, 2016
Nonparametric regression for large numbers of features (p) is an increasingly important problem. If the sample size n is massive, a common strategy is to partition the feature space, and then separately apply simple models to each partition set. This is no ...
Cite
Journal ArticleMethods in Ecology and Evolution · May 1, 2016
We present a hierarchical latent variable model that partitions variation in species occurrences and co-occurrences simultaneously at multiple spatial scales. We illustrate how the parameterized model can be used to predict the occurrences of a species by ...
Full textCite
Journal ArticleJournal of Computational and Graphical Statistics · April 2, 2016
High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. The analysis of such data poses many computational challenges, especially when the observations are correlated over time and/or across space. In ...
Full textCite
Journal ArticleAnnals of Statistics · April 1, 2016
There is increasing interest in the problem of nonparametric regression with high-dimensional predictors. When the number of predictors D is large, one encounters a daunting problem in attempting to estimate aD-dimensional surface based on limited data. Fo ...
Full textCite
Journal ArticleBiometrics · March 2016
It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent s ...
Full textCite
Journal ArticleNeural networks : the official journal of the International Neural Network Society · March 2016
Subspace segmentation is a fundamental topic in computer vision and machine learning. However, the success of many popular methods is about independent subspace segmentation instead of the more flexible and realistic disjoint subspace segmentation. Focusin ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2016
In many application areas, data are collected on a categorical response and high-dimensional categorical predictors, with the goals being to build a parsimonious model for classification while doing inferences on the important predictors. In settings such ...
Full textCite
Journal ArticleFront Behav Neurosci · 2016
Development of proficient spoken language skills is disrupted by mutations of the FOXP2 transcription factor. A heterozygous missense mutation in the KE family causes speech apraxia, involving difficulty producing words with complex learned sequences of sy ...
Full textOpen AccessLink to itemCite
Conference33rd International Conference on Machine Learning Icml 2016 · January 1, 2016
Ordinary least squares (OI,S) is the default method for fitting linear models, but is not applicable for problems with dimensionality larger than the sample size. For these problems, we advocate the use of a generalized version of OLS motivated by ridge re ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2016
Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed a ...
Cite
ConferenceProceedings of the 19th International Conference on Artificial Intelligence and Statistics Aistats 2016 · January 1, 2016
It is standard to assume a low-dimensional structure in estimating a high-dimensional density. However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we t ...
Cite
ConferenceProceedings of the 19th International Conference on Artificial Intelligence and Statistics Aistats 2016 · January 1, 2016
We utilize copulas to constitute a unified framework for constructing and optimizing variational proposals in hierarchical Bayesian models. For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational ...
Cite
Journal ArticleBioinformatics (Oxford, England) · December 2015
MotivationBoth single marker and simultaneous analysis face challenges in GWAS due to the large number of markers genotyped for a small number of subjects. This large p small n problem is particularly challenging when the trait under investigation ...
Full textCite
Journal ArticleBiometrika · December 2015
This article concerns testing for equality of distribution between groups. We focus on screening variables with shared distributional features such as common support, modes and patterns of skewness. We propose a Bayesian testing method using kernel mixture ...
Full textCite
Journal ArticleJournal of the American Statistical Association · December 2015
Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routin ...
Full textCite
Journal ArticleJournal of Machine Learning Research · December 1, 2015
Capturing predictor-dependent correlations amongst the elements of a multivariate response vector is fundamental to numerous applied domains, including neuroscience, epidemiology, and finance. Although there is a rich literature on methods for allowing the ...
Cite
Journal ArticleJournal of the American Statistical Association · October 2, 2015
As an alternative to variable selection or shrinkage in high-dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the predictors can ...
Full textCite
ConferenceRecsys 2015 Proceedings of the 9th ACM Conference on Recommender Systems · September 16, 2015
Recommender systems are routinely equipped with standardized taxonomy that associates each item with one or more categories or genres. Although such information does not directly imply the quality of an item, the distribution of ratings vary greatly across ...
Full textCite
Journal Article · August 13, 2015
The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It is common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to ...
Open AccessLink to itemCite
Journal ArticleStatistics in biosciences · May 2015
In longitudinal data analysis, there is great interest in assessing the impact of predictors on the time-varying trajectory in a response variable. In such settings, an important issue is to account for heterogeneity in the shape of the trajectory among su ...
Full textCite
Journal ArticleContraception · April 2015
ObjectiveIn 2001, we provided benchmark estimates of probability of pregnancy given a single act of intercourse. Those calculations assumed that intercourse and ovulation are independent. Subsequent research has shown that this assumption is not v ...
Full textCite
Journal ArticleInformation and Inference · March 1, 2015
Artin Armagan's and Rayan Saab's affiliations were switched in the published version of this article. Artin Armagan's affiliation should be: SAS Institute, Inc., Raleigh, NC, USA; Rayan Saab's affiliation should be: Department of Mathematics, University of ...
Full textCite
Journal ArticleBMC Genomics · January 22, 2015
BACKGROUND: Expression quantitative trait loci (eQTL) play an important role in the regulation of gene expression. Gene expression levels and eQTLs are expected to vary from tissue to tissue, and therefore multi-tissue analyses are necessary to fully under ...
Full textOpen AccessLink to itemCite
Journal ArticleJournal of the Royal Statistical Society. Series B, Statistical methodology · January 2015
Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a para ...
Full textCite
Journal ArticleStatistics and Its Interface · January 1, 2015
Although continuous density estimation has received abundant attention in the Bayesian nonparametrics literature, there is limited theory on multivariate mixed scale density estimation. In this note, we consider a general framework to jointly model continu ...
Full textCite
Journal ArticleFront Behav Neurosci · 2015
In 2005, Holy and Guo advanced the idea that male mice produce ultrasonic vocalizations (USV) with some features similar to courtship songs of songbirds. Since then, studies showed that male mice emit USV songs in different contexts (sexual and other) and ...
Full textOpen AccessLink to itemCite
ConferenceJournal of Machine Learning Research · January 1, 2015
The promise of Bayesian methods for big data sets has not fully been realized due to the lack of scalable computational algorithms. For massive data, it is necessary to store and process subsets on different machines in a distributed manner. We propose a s ...
Cite
Journal ArticleJournal of the American Statistical Association · January 2015
It has become routine to collect data that are structured as multiway arrays (tensors). There is an enormous literature on low rank and sparse matrix factorizations, but limited consideration of extensions to the tensor case in statistics. The most common ...
Full textCite
Journal ArticleBiometrika · January 1, 2015
In many application areas, a primary focus is on assessing evidence in the data refuting the assumption of independence of Y and X conditionally on Z, with Y response variables, X predictors of interest, and Z covariates. Ideally, one would have methods av ...
Full textCite
Conference2015 IEEE 6th International Workshop on Computational Advances in Multi Sensor Adaptive Processing Camsap 2015 · January 1, 2015
Probabilistically quantifying uncertainty in parameters, predictions and decisions is a crucial component of broad scientific and engineering applications. This is however difficult if the number of parameters far exceeds the sample size. Although there ar ...
Full textCite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2015
Variable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to produce the final ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2015
The modern scale of data has brought new challenges to Bayesian inference. In particular, conventional MCMC algorithms are computationally very expensive for large data sets. A promising approach to solve this problem is embarrassingly parallel MCMC (EP-MC ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2015
Learning of low dimensional structure in multidimensional data is a canonical problem in machine learning. One common approach is to suppose that the observed data are close to a lower-dimensional smooth manifold. There are a rich variety of manifold learn ...
Cite
Journal ArticleBiometrika · December 1, 2014
Symmetric binary matrices representing relations are collected in many areas. Our focus is on dynamically evolving binary relational matrices, with interest being on inference on the relationship structure and prediction. We propose a nonparametric Bayesia ...
Full textCite
Journal ArticleStatistics and Computing · November 1, 2014
Bayesian hierarchical modeling with Gaussian process random effects provides a popular approach for analyzing point-referenced spatial data. For large spatial data sets, however, generic posterior sampling is infeasible due to the extremely high computatio ...
Full textCite
Journal ArticleJournal of the American Statistical Association · October 2014
Modeling object boundaries based on image or point cloud data is frequently necessary in medical and scientific applications ranging from detecting tumor contours for targeted radiation therapy, to the classification of organisms based on their structural ...
Full textCite
Journal ArticleAnnals of Applied Statistics · September 1, 2014
We discuss functional clustering procedures for nested designs, where multiple curves are collected for each subject in the study. We start by considering the application of standard functional clustering tools to this problem, which leads to groupings bas ...
Full textCite
Journal ArticleJournal of the American Statistical Association · July 2014
The statistics literature on functional data analysis focuses primarily on flexible black-box approaches, which are designed to allow individual curves to have essentially any shape while characterizing variability. Such methods typically cannot incorporat ...
Full textCite
Journal ArticleBioinformatics (Oxford, England) · June 2014
MotivationEstimating a phenotype distribution conditional on a set of discrete-valued predictors is a commonly encountered task. For example, interest may be in how the density of a quantitative trait varies with single nucleotide polymorphisms an ...
Full textCite
Journal ArticleInformation and Inference · June 1, 2014
We study the behavior of the posterior distribution in high-dimensional Bayesian Gaussian linear regression models having p ≫ n, where p is the number of predictors and n is the sample size. Our focus is on obtaining quantitative finite sample bounds ensur ...
Full textCite
Journal ArticleBlood · May 8, 2014
In this study, we define the genetic landscape of mantle cell lymphoma (MCL) through exome sequencing of 56 cases of MCL. We identified recurrent mutations in ATM, CCND1, MLL2, and TP53. We further identified a number of novel genes recurrently mutated in ...
Full textLink to itemCite
Journal ArticleJournal of the American Statistical Association · March 2014
There is a rich literature on Bayesian variable selection for parametric models. Our focus is on generalizing methods and asymptotic theory established for mixtures of g-priors to semiparametric linear regression models having unknown residual densi ...
Full textCite
Journal ArticleJournal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America · February 2014
In this article, we propose generalized Bayesian dynamic factor models for jointly modeling mixed-measurement time series. The framework allows mixed-scale measurements associated with each time series, with different measurements having different distribu ...
Full textCite
Journal ArticleAnnals of the Institute of Statistical Mathematics · February 2014
We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mi ...
Full textCite
Conference2014 IEEE International Conference on Image Processing Icip 2014 · January 28, 2014
We introduce an algorithm that removes the deleterious effect of cradling on X-ray images of paintings on wooden panels. The algorithm consists of a three stage procedure. Firstly, the cradled regions are located automatically. The second step consists of ...
Full textCite
Journal ArticleJournal of Applied Statistics · January 1, 2014
A Bayesian statistical model is developed for analysis of the time-evolving properties of infectious disease, with a particular focus on viruses. The model employs a latent semi-Markovian state process, and the state-transition statistics are driven by thr ...
Full textCite
Journal ArticleJournal of Machine Learning Research · January 1, 2014
Flexible covariate-dependent density estimation can be achieved by modelling the joint density of the response and covariates as a Dirichlet process mixture. An appealing aspect of this approach is that computations are relatively easy. In this paper, we e ...
Cite
Journal ArticleBiometrika · January 1, 2014
Shape-constrained regression analysis has applications in dose-response modelling, environmental risk assessment, disease screening and many other areas. Incorporating the shape constraints can improve estimation efficiency and avoid implausible results. W ...
Full textCite
Journal ArticleJournal of Machine Learning Research · January 1, 2014
In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such time-varyin ...
Cite
Journal ArticleJournal of the American Statistical Association · January 2014
In many applications involving functional data, prior information is available about the proportion of curves having different attributes. It is not straightforward to include such information in existing procedures for functional data analysis. Generalizi ...
Full textCite
Journal ArticleStatistics and Probability Letters · January 1, 2014
We propose a targeted and robust modeling of dependence in multivariate time series via dynamic networks, with time-varying predictors included to improve interpretation and prediction. The model is applied to financial markets, estimating effects of verba ...
Full textCite
Journal ArticleAnnals of Statistics · January 1, 2014
Sparse Bayesian factor models are routinely implemented for parsimonious dependence modeling and dimensionality reduction in highdimensional applications. We provide theoretical understanding of such Bayesian procedures in terms of posterior convergence ra ...
Full textCite
Journal ArticleSIAM Journal on Optimization · January 1, 2014
Stochastic search involves finding a set of controllable parameters that minimizes an unknown objective function using a set of noisy observations. We consider the case when the unknown function is convex and a metamodel is used as a surrogate objective fu ...
Full textCite
Journal ArticleBiometrika · January 1, 2014
Although discrete mixture modelling has formed the backbone of the literature on Bayesian density estimation, there are some well-known disadvantages. As an alternative to discrete mixtures, we propose a class of priors based on random nonlinear functions ...
Full textCite
Conference31st International Conference on Machine Learning Icml 2014 · January 1, 2014
We present a scalable Bayesian framework for low-rank decomposition of multiway tensor data with missing observations. The key issue of pre-specifying the rank of the decomposition is sidestepped in a principled manner using a multiplicative gamma process ...
Cite
Conference31st International Conference on Machine Learning Icml 2014 · January 1, 2014
Many Bayesian learning methods for massive data benefit from working with small subsets of observations. In particular, significant progress has been made in scalable Bayesian learning via stochastic approximation. However, Bayesian learning methods in dis ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2014
For massive data sets, efficient computation commonly relies on distributed algorithms that store and process subsets of the data on different machines, minimizing communication costs. Our focus is on regression and classification problems involving many f ...
Cite
ConferenceJournal of Machine Learning Research · January 1, 2014
Time-varying adjacency matrices encoding the presence or absence of a relation among entities are available in many research fields. Motivated by an application to studying dynamic networks among sports teams, we propose a Bayesian nonparametric model. The ...
Cite
Journal ArticleAnnals of Statistics · January 1, 2014
In nonparametric regression problems involving multiple predictors, there is typically interest in estimating an anisotropic multivariate regression surface in the important predictors while discarding the unimportant ones. Our focus is on defining a Bayes ...
Full textCite
Chapter · January 1, 2014
I reflect on the past, present, and future of nonparametric Bayesian statistics. Current nonparametric Bayes research tends to be split between theoretical studies, seeking to understand relatively simple models, and machine learning, defining new models a ...
Cite
Journal ArticleBiometrika · December 1, 2013
Data on count processes arise in a variety of applications, including longitudinal, spatial and imaging studies measuring count responses. The literature on statistical models for dependent count data is dominated by models built from hierarchical Poisson ...
Full textCite
Journal ArticleBiometrika · December 1, 2013
We investigate the asymptotic behaviour of posterior distributions of regression coefficients in high-dimensional linear models as the number of dimensions grows with the number of observations. We show that the posterior distribution concentrates in neigh ...
Full textCite
Journal ArticleJournal of Machine Learning Research · November 1, 2013
We propose a new, nonparametric method for multivariate regression subject to convexity or concavity constraints on the response function. Convexity constraints are common in economics, statistics, operations research, financial engineering and optimizatio ...
Cite
Journal ArticleEpidemiology (Cambridge, Mass.) · November 2013
BackgroundSome environmental chemical exposures are lipophilic and need to be adjusted by serum lipid levels before data analyses. There are currently various strategies that attempt to account for this problem, but all have their drawbacks. To ad ...
Full textCite
Journal ArticleBioinformatics (Oxford, England) · October 2013
MotivationIn biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most ...
Full textCite
Journal ArticleComputational Statistics and Data Analysis · July 29, 2013
We consider modeling spatio-temporally indexed relational data, motivated by analysis of voting data for the United States House of Representatives over two decades. The data are characterized by incomplete binary matrices, representing votes of legislator ...
Full textCite
Journal ArticleG3 (Bethesda) · July 8, 2013
Admixture mapping is a popular tool to identify regions of the genome associated with traits in a recently admixed population. Existing methods have been developed primarily for identification of a single locus influencing a dichotomous trait within a case ...
Full textOpen AccessLink to itemCite
Journal ArticleJournal of the American Statistical Association · June 2013
Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through gene ...
Full textOpen AccessCite
Journal ArticleJournal of the American Statistical Association · May 31, 2013
It has become common for datasets to contain large numbers of variables in studies conducted in areas such as genetics, machine vision, image analysis, and many others. When analyzing such data, parametric models are often too inflexible while nonparametri ...
Full textCite
Journal ArticleJournal of multivariate analysis · April 2013
A wide variety of priors have been proposed for nonparametric Bayesian estimation of conditional distributions, and there is a clear need for theorems providing conditions on the prior for large support, as well as posterior consistency. Estimation of an u ...
Full textCite
Journal ArticleBayesian Analysis · March 22, 2013
A model is presented for analysis of multivariate binary data with spatio-temporal dependencies, and applied to congressional roll call data from the United States House of Representatives and Senate. The model considers each legislator's constituency (loc ...
Full textCite
Journal ArticleProc Natl Acad Sci U S A · January 22, 2013
Diffuse large B-cell lymphoma (DLBCL) is the most common form of lymphoma in adults. The disease exhibits a striking heterogeneity in gene expression profiles and clinical outcomes, but its genetic causes remain to be fully defined. Through whole genome an ...
Full textLink to itemCite
Journal ArticleAdvances in Neural Information Processing Systems · January 1, 2013
Nonparametric estimation of the conditional distribution of a response given highdimensional features is a challenging problem. It is important to allow not only the mean but also the variance and shape of the response density to change flexibly with featu ...
Open AccessCite
Journal ArticleJournal of the American Statistical Association · January 2013
In many applications, it is of interest to study trends over time in relationships among categorical variables, such as age group, ethnicity, religious affiliation, political party and preference for particular policies. At each time point, a sample of ind ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2013
We propose a nested Gaussian process (nGP) as a locally adaptive prior for Bayesian nonparametric regression. Specified through a set of stochastic differential equations (SDEs), the nGP imposes a Gaussian process prior for the function's mth-order ...
Full textCite
Journal ArticleAdvances in Neural Information Processing Systems · January 1, 2013
In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such locally ada ...
Cite
ConferenceJournal of Machine Learning Research · January 1, 2013
Bayesian classification commonly relies on probit models, with data augmentation algorithms used for posterior computation. By imputing latent Gaussian variables, one can often trivially adapt computational approaches used in Gaussian models. However, MCMC ...
Cite
ConferenceJournal of Machine Learning Research · January 1, 2013
There is increasing interest in broad application areas in defining flexible joint models for data having a variety of measurement scales, while also allowing data of complex types, such as functions, images and documents. We consider a general framework f ...
Cite
Book · January 1, 2013
Broadening its scope to nonstatisticians, Bayesian Methods for Data Analysis, Third Edition provides an accessible introduction to the foundations and applications of Bayesian analysis. Along with a complete reorganization of the material, this edition con ...
Cite
Journal ArticleIEEE transactions on pattern analysis and machine intelligence · January 2013
Unsupervised multi-layered ("deep") models are considered for general data, with a particular focus on imagery. The model is represented using a hierarchical convolutional factor-analysis construction, with sparse factor loadings and scores. The computatio ...
Cite
Journal ArticleBiometrika · 2013
Gaussian processes are widely used in nonparametric regression, classification and spatiotemporal modelling, facilitated in part by a rich literature on their theoretical properties. However, one of their practical limitations is expensive computation, typ ...
Full textOpen AccessCite
Journal Article2013 18th International Conference on Digital Signal Processing DSP 2013 · January 1, 2013
The preservation of our cultural heritage is of paramount importance. Thanks to recent developments in digital acquisition techniques, powerful image analysis algorithms are developed which can be useful non-invasive tools to assist in the restoration and ...
Full textOpen AccessCite
Journal ArticleStatistics and Its Interface · January 1, 2013
In many applications, interest focuses on assessing relationships between predictors and the quantiles of the distribution of a continuous response. For example, in epidemiology studies, cutoffs to define premature delivery have been based on the 10th perc ...
Full textCite
Journal ArticleStatistica Sinica · January 2013
We propose a generalized double Pareto prior for Bayesian shrinkage estimation and inferences in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, forming a bridge between the Laplace and Normal-Jeffreys' prio ...
Cite
Journal ArticleStat Med · December 20, 2012
Biomedical studies have a common interest in assessing relationships between multiple related health outcomes and high-dimensional predictors. For example, in reproductive epidemiology, one may collect pregnancy outcomes such as length of gestation and bir ...
Full textLink to itemCite
Journal ArticleBiometrics · December 2012
In studies involving functional data, it is commonly of interest to model the impact of predictors on the distribution of the curves, allowing flexible effects on not only the mean curve but also the distribution about the mean. Characterizing the curve fo ...
Full textCite
Journal ArticleNat Genet · December 2012
Burkitt lymphoma is characterized by deregulation of MYC, but the contribution of other genetic mutations to the disease is largely unknown. Here, we describe the first completely sequenced genome from a Burkitt lymphoma tumor and germline DNA from the sam ...
Full textLink to itemCite
Journal ArticleBayesian analysis · December 2012
A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson ...
Full textCite
Journal ArticleAdvances in Neural Information Processing Systems · December 1, 2012
We propose a multiresolution Gaussian process to capture long-range, non-Markovian dependencies while allowing for abrupt changes and non-stationarity. The multiresolution GP hierarchically couples a collection of smooth GPs, each defined over an element o ...
Cite
Journal ArticleAdvances in Neural Information Processing Systems · December 1, 2012
Discrete mixtures are used routinely in broad sweeping applications ranging from unsupervised settings to fully supervised multi-task learning. Indeed, finite mixtures and infinite mixtures, relying on Dirichlet processes and modifications, have become a s ...
Cite
Journal ArticleProceedings of the 29th International Conference on Machine Learning Icml 2012 · October 10, 2012
In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model f ...
Open AccessCite
Journal ArticleProceedings of the 29th International Conference on Machine Learning Icml 2012 · October 10, 2012
This paper presents an application of statistical machine learning to the field of water-marking. We propose a new attack model on additive spread-spectrum watermarking systems. The proposed attack is based on Bayesian statistics. We consider the scenario ...
Cite
Journal ArticleProceedings of the 29th International Conference on Machine Learning Icml 2012 · October 10, 2012
Convex regression is a promising area for bridging statistical estimation and deterministic convex optimization. New piecewise linear convex regression methods (Hannah and Dunson, 2011; Magnani and Boyd, 2009) are fast and scalable, but can have instabilit ...
Cite
Journal ArticleNeuroImage · October 2012
We propose a semiparametric Bayesian local functional model (BFM) for the analysis of multiple diffusion properties (e.g., fractional anisotropy) along white matter fiber bundles with a set of covariates of interest, such as age and gender. BFM accounts fo ...
Full textCite
Journal ArticleJournal of Multivariate Analysis · October 1, 2012
Our first focus is prediction of a categorical response variable using features that lie on a general manifold. For example, the manifold may correspond to the surface of a hypersphere. We propose a general kernel mixture model for the joint distribution o ...
Full textCite
Journal ArticleAnnals of the Institute of Statistical Mathematics · August 2012
This article considers a broad class of kernel mixture density models on compact metric spaces and manifolds. Following a Bayesian approach with a nonparametric prior on the location mixing distribution, sufficient conditions are obtained on the kernel, pr ...
Full textCite
Journal ArticleJournal of the American Statistical Association · March 2012
Gaussian latent factor models are routinely used for modeling of dependence in continuous, binary, and ordered categorical data. For unordered categorical variables, Gaussian latent factor models lead to challenging computation and complex modeling structu ...
Full textCite
Chapter · January 19, 2012
It is routine in many fields to collect data having a variety of measurement scales and supports. For example, in biomedical studies for each patient one may collect functional data on a biomarker over time, gene expression values normalized to lie on a hy ...
Full textCite
ConferenceJournal of Machine Learning Research · January 1, 2012
In this work, we propose a hierarchical latent dictionary approach to estimate the timevarying mean and covariance of a process for which we have only limited noisy samples. We fully leverage the limited sample size and redundancy in sensor measurements by ...
Cite
Journal ArticleJournal of the American Statistical Association · January 2012
Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used approaches rely on the incorpora ...
Full textCite
Journal ArticleIEEE transactions on image processing : a publication of the IEEE Signal Processing Society · January 2012
Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive, incomplete, and/or noisy measurements. A truncated beta-Bernoulli process is employed to infer an appropriate dictionary for the data under test and also for imag ...
Full textCite
Journal ArticleJournal of Machine Learning Research · January 1, 2012
A beta-negative binomial (BNB) process is proposed, leading to a beta-gamma-Poisson process, which may be viewed as a "multiscoop" generalization of the beta-Bernoulli process. The BNB process is augmented into a beta-gamma-gamma-Poisson hierarchical struc ...
Open AccessCite
Journal ArticleAdvances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 · December 1, 2011
A new Lévy process prior is proposed for an uncountable collection of covariate-dependent feature-learning measures; the model is called the kernel beta process (KBP). Available covariates are handled efficiently via the kernel construction, with covariate ...
Cite
Journal ArticleAdvances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 · December 1, 2011
The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that m ...
Cite
Journal ArticleJournal of the American Statistical Association · December 2011
Although Bayesian nonparametric mixture models for continuous data are well developed, there is a limited literature on related approaches for count data. A common strategy is to use a mixture of Poissons, which unfortunately is quite restrictive in not ac ...
Full textCite
Journal ArticleProceedings of the 28th International Conference on Machine Learning Icml 2011 · October 7, 2011
A convolutional factor-analysis model is developed, with the number of filters (factors) inferred via the beta process (BP) and hierarchical BP, for single-task and multi-task learning, respectively. The computation of the model parameters is implemented w ...
Cite
Journal ArticleProceedings of the 28th International Conference on Machine Learning Icml 2011 · October 7, 2011
Storage problems are an important subclass of stochastic control problems. This paper presents a new method, approximate dynamic programming for storage, to solve storage problems with continuous, convex decision sets. Unlike other solution procedures, ADP ...
Cite
Journal ArticleBiometrika · September 2011
Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distribution ...
Full textCite
Journal ArticleBiometrics · September 2011
Current status data are a type of interval-censored event time data in which all the individuals are either left or right censored. For example, our motivation is drawn from a cross-sectional study, which measured whether or not fibroid onset had occurred ...
Full textCite
Journal ArticleJ Am Stat Assoc · September 1, 2011
Latent class models (LCMs) are used increasingly for addressing a broad variety of problems, including sparse modeling of multivariate and longitudinal data, model-based clustering, and flexible inferences on predictor effects. Typical frequentist LCMs req ...
Full textLink to itemCite
Journal ArticleICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings · August 18, 2011
A dependent hierarchical beta process (dHBP) is developed as a prior for data that may be represented in terms of a sparse set of latent features (dictionary elements), with covariate-dependent feature usage. The dHBP is applicable to general covariates an ...
Full textCite
Journal ArticleStatistics and Probability Letters · August 1, 2011
It is increasingly common to be faced with longitudinal or multi-level data sets that have large numbers of predictors and/or a large sample size. Current methods of fitting and inference for mixed effects models tend to perform poorly in such settings. Wh ...
Full textCite
Journal ArticlePLoS computational biology · July 2011
Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false ...
Full textOpen AccessCite
Journal ArticleBiometrics · June 2011
This article considers the problem of selecting predictors of time to an event from a high-dimensional set of candidate predictors using data from multiple studies. As an alternative to the current multistage testing approaches, we propose to model the stu ...
Full textCite
Journal ArticleBiometrika · June 2011
We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadin ...
Full textCite
Journal ArticleSankhya B · May 1, 2011
By choosing a species sampling random probability measure for the distribution of the basis coefficients, a general class of nonparametric Bayesian methods for clustering of functional data is developed. Allowing the basis functions to be unknown, one face ...
Full textCite
Journal ArticleTechnometrics : a journal of statistics for the physical, chemical, and engineering sciences · May 2011
In studies where data are generated from multiple locations or sources it is common for there to exist observations that are quite unlike the majority. Motivated by the application of establishing a reference value in an inter-laboratory setting when outly ...
Full textCite
Journal ArticleJ Neurosci · April 27, 2011
Alterations in anxiety-related processing are observed across many neuropsychiatric disorders, including bipolar disorder. Though polymorphisms in a number of circadian genes confer risk for this disorder, little is known about how changes in circadian gen ...
Full textLink to itemCite
Journal ArticleJournal of the American Statistical Association · March 2011
Tropospheric ozone is one of the six criteria pollutants regulated by the United States Environmental Protection Agency under the Clean Air Act and has been linked with several adverse health effects, including mortality. Due to the strong dependence on we ...
Full textCite
Journal ArticleBiometrika · March 2011
We consider geostatistical models that allow the locations at which data are collected to be informative about the outcomes. A Bayesian approach is proposed, which models the locations using a log Gaussian Cox process, while modelling the outcomes conditio ...
Full textCite
Journal ArticleBayesian analysis · March 2011
We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowi ...
Full textCite
Journal ArticleAnnals of the Institute of Statistical Mathematics · February 2011
As a generalization of the Dirichlet process (DP) to allow predictor dependence, we propose a local Dirichlet process (lDP). The lDP provides a prior distribution for a collection of random probability measures indexed by predictors. This is accomplished b ...
Full textCite
Journal ArticleStatistics & probability letters · February 2011
We focus on Bayesian variable selection in regression models. One challenge is to search the huge model space adequately, while identifying high posterior probability regions. In the past decades, the main focus has been on the use of Markov chain Monte Ca ...
Full textCite
ConferenceAdvances in Neural Information Processing Systems 24 25th Annual Conference on Neural Information Processing Systems 2011 Nips 2011 · January 1, 2011
A new Lévy process prior is proposed for an uncountable collection of covariate-dependent feature-learning measures; the model is called the kernel beta process (KBP). Available covariates are handled efficiently via the kernel construction, with covariate ...
Cite
ConferenceAdvances in Neural Information Processing Systems 24 25th Annual Conference on Neural Information Processing Systems 2011 Nips 2011 · January 1, 2011
The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that m ...
Cite
ConferenceAdvances in Neural Information Processing Systems 24 25th Annual Conference on Neural Information Processing Systems 2011 Nips 2011 · January 1, 2011
In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better pro ...
Cite
Journal ArticleJ Am Stat Assoc · January 1, 2011
There is often interest in predicting an individual's latent health status based on high-dimensional biomarkers that vary over time. Motivated by time-course gene expression array data that we have collected in two influenza challenge studies performed wit ...
Full textLink to itemCite
Journal ArticleJournal of Machine Learning Research · January 1, 2011
A dependent hierarchical beta process (dHBP) is developed as a prior for data that may be represented in terms of a sparse set of latent features, with covariate-dependent feature usage. The dHBP is applicable to general covariates and data models, imposin ...
Cite
Journal ArticleProceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning · January 2011
A tree-structured multiplicative gamma process (TMGP) is developed, for inferring the depth of a tree-based factor-analysis model. This new model is coupled with the nested Chinese restaurant process, to nonparametrically infer the depth and width (structu ...
Cite
Journal ArticleProceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning · January 2011
A new hierarchical tree-based topic model is developed, based on nonparametric Bayesian techniques. The model has two unique attributes: (i) a child node in the tree may have more than one parent, with the goal of eliminating redundant sub-topics de ...
Cite
Journal ArticleJournal of machine learning research : JMLR · January 2011
A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via ...
Cite
Journal ArticleAdvances in Neural Information Processing Systems · 2011
In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better pro ...
Open AccessCite
Journal Article2010 IEEE Sensor Array and Multichannel Signal Processing Workshop SAM 2010 · December 20, 2010
The Beta-Binomial processes are considered for inferring missing values in matrices. The model moves beyond the low-rank assumption, modeling the matrix columns as residing in a nonlinear subspace. Large-scale problems are considered via efficient Gibbs sa ...
Full textCite
Journal ArticleBiometrika · December 2010
Statistical analysis on landmark-based shape spaces has diverse applications in morphometrics, medical diagnostics, machine vision and other areas. These shape spaces are non-Euclidean quotient manifolds. To conduct nonparametric inferences, one may define ...
Full textCite
Journal ArticleIEEE transactions on signal processing : a publication of the IEEE Signal Processing Society · December 2010
Nonparametric Bayesian methods are employed to constitute a mixture of low-rank Gaussians, for data x ∈ ℝ N that are of high dimension N but are constrained to reside in a low-dimensional subregion of ℝ N< ...
Full textCite
Journal ArticleAdvances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010 · December 1, 2010
We consider problems for which one has incomplete binary matrices that evolve with time (e:g:, the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure ...
Cite
Journal ArticleBMC Bioinformatics · November 9, 2010
BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis ...
Full textOpen AccessLink to itemCite
Journal ArticleStatistica Sinica · October 2010
Mixtures provide a useful approach for relaxing parametric assumptions. Discrete mixture models induce clusters, typically with the same cluster allocation for each parameter in multivariate cases. As a more flexible approach that facilitates sparse nonpar ...
Cite
Journal ArticleBiometrics · September 2010
Dynamic latent class models provide a flexible framework for studying biologic processes that evolve over time. Motivated by studies of markers of the fertile days of the menstrual cycle, we propose a discrete-time dynamic latent class framework, allowing ...
Full textCite
Journal ArticleComputational statistics & data analysis · September 2010
In parametric hierarchical models, it is standard practice to place mean and variance constraints on the latent variable distributions for the sake of identifiability and interpretability. Because incorporation of such constraints is challenging in semipar ...
Full textCite
Journal ArticleStatistica Sinica · July 1, 2010
Starting with a carefully formulated Dirichlet process (DP) mixture model, we derive a generalized product partition model (GPPM) in which the partition process is predictor-dependent. The GPPM generalizes DP clustering to relax the exchangeability assumpt ...
Open AccessCite
Journal ArticleBiostatistics (Oxford, England) · July 2010
In various application areas, prior information is available about the direction of the effects of multiple predictors on the conditional response distribution. For example, in epidemiology studies of potentially adverse exposures and continuous health res ...
Full textCite
Journal ArticleBiometrics · June 2010
High-dimensional and highly correlated data leading to non- or weakly identified effects are commonplace. Maximum likelihood will typically fail in such situations and a variety of shrinkage methods have been proposed. Standard techniques, such as ridge re ...
Full textCite
Journal ArticleJournal of the American Statistical Association · June 1, 2010
The dynamic hierarchical Dirichlet process (dHDP) is developed to model complex sequential data, with a focus on audio signals from music. The music is represented in terms of a sequence of discrete observations, and the sequence is modeled using a hidden ...
Full textOpen AccessCite
Journal ArticleJournal of the American Statistical Association · April 2010
We develop a model for stochastic processes with random marginal distributions. Our model relies on a stick-breaking construction for the marginal distribution of the process, and introduces dependence across locations by using a latent Gaussian copula mod ...
Full textOpen AccessCite
Journal ArticleJournal of machine learning research : JMLR · March 2010
A non-parametric hierarchical Bayesian framework is developed for designing a classifier, based on a mixture of simple (linear) classifiers. Each simple classifier is termed a local "expert", and the number of experts and their construction are manifested ...
Cite
ConferenceAdvances in Neural Information Processing Systems 23 24th Annual Conference on Neural Information Processing Systems 2010 Nips 2010 · January 1, 2010
We consider problems for which one has incomplete binary matrices that evolve with time (e:g:, the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure ...
Cite
Journal ArticleThe international journal of biostatistics · January 2010
Stochastic search variable selection (SSVS) algorithms provide an appealing and widely used approach for searching for good subsets of predictors while simultaneously estimating posterior model probabilities and model-averaged predictive distributions. Thi ...
Full textCite
Journal ArticleJournal of the American Statistical Association · December 2009
This article considers a methodology for flexibly characterizing the relationship between a response and multiple predictors. Goals are (1) to estimate the conditional response distribution addressing the distributional changes across the predictor space, ...
Full textOpen AccessCite
Journal ArticleICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings · September 23, 2009
We propose a multi-task learning (MTL) framework for nonlinear classification, based on an infinite set of local experts in feature space. The usage of local experts enables sharing at the expert-level, encouraging the borrowing of information even if task ...
Full textCite
Journal ArticleICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings · September 23, 2009
A Bayesian dynamic model is developed to model complex sequential data, with a focus on audio signals from music. The music is represented in terms of a sequence of discrete observations, and the sequence is modeled using a hidden Markov model (HMM) with t ...
Full textCite
Journal ArticleBiometrics · September 2009
A variety of flexible approaches have been proposed for functional data analysis, allowing both the mean curve and the distribution about the mean to be unknown. Such methods are most useful when there is limited prior information. Motivated by application ...
Full textCite
Journal ArticleEpidemiology (Cambridge, Mass.) · July 2009
BackgroundInsulin-like growth factor-I (IGF-I) and insulin stimulate cell proliferation in uterine leiomyoma (fibroid) tissue. We hypothesized that circulating levels of these proteins would be associated with increased prevalence and size of uter ...
Full textCite
Journal ArticleJournal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America · June 2009
Factor analytic models are widely used in social sciences. These models have also proven useful for sparse modeling of the covariance structure in multidimensional data. Normal prior distributions for factor loadings and inverse gamma prior distributions f ...
Full textCite
Journal ArticleStatistica Sinica · April 1, 2009
We focus on developing nonparametric Bayes methods for collections of dependent random functions, allowing individual curves to vary flexibly while adaptively borrowing information. A prior is proposed, which is expressed as a hierarchical mixture of weigh ...
Cite
Journal ArticleBiometrical journal. Biometrische Zeitschrift · April 2009
In biomedical research, hierarchical models are very widely used to accommodate dependence in multivariate and longitudinal data and for borrowing of information across data from different sources. A primary concern in hierarchical modeling is sensitivity ...
Full textCite
Journal ArticleIEEE Transactions on Signal Processing · January 29, 2009
Compressive sensing (CS) is a framework whereby one performs N nonadaptive measurements to constitute a vector v∈ℝN with v used to recover an approximation u∈RℝM to a desired signal u∈RℝM with N≪ M; this is performed under ...
Full textCite
Journal ArticleBiometrika · January 2009
This paper focuses on the problem of choosing a prior for an unknown random effects distribution within a Bayesian hierarchical model. The goal is to obtain a sparse representation by allowing a combination of global and local borrowing of information. A l ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2009
Motivated by the need to understand and predict early pregnancy loss using hormonal indicators of pregnancy health, this paper proposes a semiparametric Bayes approach for assessing the relationship between functional predictors and a response. A multivari ...
Full textCite
Journal ArticleBiostatistics (Oxford, England) · January 2009
Finite mixtures of Gaussian distributions are known to provide an accurate approximation to any unknown density. Motivated by DNA repair studies in which data are collected for samples of cells from different individuals, we propose a class of hierarchical ...
Full textCite
Journal ArticleAdvances in neural information processing systems · January 2009
A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented i ...
Cite
Journal ArticleBiometrika · January 2009
In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. We propose a hierarchical model that allows us to simultaneously estimate multiple curves nonpar ...
Full textCite
Journal ArticleBiometrika · December 2008
This article considers Bayesian inference about collections of unknown distributions subject to a partial stochastic ordering. To address problems in testing of equalities between groups and estimation of group-specific distributions, we propose classes of ...
Full textCite
Journal ArticleIEEE Transactions on Signal Processing · August 1, 2008
A new hierarchical nonparametric Bayesian framework is proposed for the problem of multi-task learning (MTL) with sequential data. The models for multiple tasks, each characterized by sequential data, are learned jointly, and the intertask relationships ar ...
Full textCite
Journal ArticleThe Journal of allergy and clinical immunology · July 2008
BackgroundBreast-feeding clearly protects against early wheezing, but recent data suggest that it might increase later risk of atopic disease and asthma.ObjectiveWe sought to examine the relationship between breast-feeding and later asthm ...
Full textCite
Journal ArticleBiometrika · June 1, 2008
We propose a class of kernel stick-breaking processes for uncountable collections of dependent random probability measures. The process is constructed by first introducing an infinite sequence of random locations. Independent random probability measures an ...
Full textCite
Journal ArticleBiometrics · June 2008
In certain biomedical studies, one may anticipate changes in the shape of a response distribution across the levels of an ordinal predictor. For instance, in toxicology studies, skewness and modality might change as dose increases. To address this issue, w ...
Full textCite
Journal ArticleJournal of the American Statistical Association · June 1, 2008
In epidemiologic studies, there is often interest in assessing the relationship between polymorphisms in functionally related genes and a health outcome. For each candidate gene, single nucleotide polymorphism (SNP) data are collected at a number of locati ...
Full textCite
Journal ArticleJournal of the American Statistical Association · March 1, 2008
In analyzing data from multiple related studies, it often is of interest to borrow information across studies and to cluster similar studies. Although parametric hierarchical models are commonly used, of concern is sensitivity to the form chosen for the ra ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2008
In epidemiology, it is often of interest to assess how individuals with different trajectories over time in an environmental exposure or biomarker differ with respect to a continuous response. For ease in interpretation and presentation of results, epidemi ...
Full textCite
Journal ArticleProceedings of the 25th International Conference on Machine Learning · January 1, 2008
The dynamic hierarchical Dirichlet process (dHDP) is developed to model the time-evolving statistical properties of sequential data sets. The data collected at any time point are represented via a mixture associated with an appropriate underlying model, in ...
Full textCite
Journal ArticleProceedings of the 25th International Conference on Machine Learning · January 1, 2008
The kernel stick-breaking process (KSBP) is employed to segment general imagery, imposing the condition that patches (small blocks of pixels) that are spatially proximate are more likely to be associated with the same cluster (segment). The number of clust ...
Full textCite
Journal ArticleProceedings of the 25th International Conference on Machine Learning · January 1, 2008
Compressive sensing (CS) is an emerging £eld that, under appropriate conditions, can signi£cantly reduce the number of measurements required for a given signal. In many applications, one is interested in multiple signals that may be measured in multiple CS ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 1, 2008
In multicenter studies, subjects in different centers may have different outcome distributions. This article is motivated by the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to b ...
Full textCite
Journal ArticleJournal of the American Statistical Association · December 1, 2007
In many applications, interest focuses on assessing the relationship between a predictor and a multivariate outcome variable, and there may be prior knowledge about the shape of the regression curves. For example, regression functions that relate dose of a ...
Full textCite
Journal Article · December 1, 2007
This chapter focuses on Bayesian structural equation modeling. Structural equation models (SEMs) with latent variables are routinely used in social science research, and are of increasing importance in biomedical applications. Standard practice in implemen ...
Full textCite
Journal ArticleBiostatistics (Oxford, England) · October 2007
For large data sets, it can be difficult or impossible to fit models with random effects using standard algorithms due to memory limitations or high computational burdens. In addition, it would be advantageous to use the abundant information to relax assum ...
Full textCite
Journal ArticleFertility and sterility · October 2007
ObjectiveTo find optimal clinical rules that maximize the probability of conception while limiting the number of intercourse days required.DesignMulticenter prospective study. Women were followed prospectively while they kept daily record ...
Full textCite
Journal ArticleStatistical methods in medical research · October 2007
Latent trait models have long been used in the social science literature for studying variables that can only be measured indirectly through multiple items. However, such models are also very useful in accounting for correlation in multivariate and longitu ...
Full textCite
Journal ArticleBiometrics · September 2007
This article considers methodology for hierarchical functional data analysis, motivated by studies of reproductive hormone profiles in the menstrual cycle. Current methods standardize the cycle lengths and ignore the timing of ovulation within the cycle, b ...
Full textCite
Journal ArticleACM International Conference Proceeding Series · August 23, 2007
A new hierarchical nonparametric Bayesian model is proposed for the problem of multitask learning (MTL) with sequential data. Sequential data are typically modeled with a hidden Markov model (HMM), for which one often must choose an appropriate model struc ...
Full textCite
Journal ArticleACM International Conference Proceeding Series · August 23, 2007
In multi-task learning our goal is to design regression or classification models for each of the tasks and appropriately share information between tasks. A Dirichlet process (DP) prior can be used to encourage task clustering. However, the DP prior does no ...
Full textCite
Journal ArticleAmerican journal of epidemiology · May 2007
Time to pregnancy, typically defined as the number of menstrual cycles required to achieve a clinical pregnancy, is widely used as a measure of couple fecundity in epidemiologic studies. Time to pregnancy studies seldom utilize detailed data on the timing ...
Full textCite
Journal ArticleStatistica Sinica · April 1, 2007
In Bayesian hierarchical modeling, it is often appealing to allow the conditional density of an (observable or unobservable) random variable Y to change flexibly with categorical and continuous predictors X. A mixture of regression models is proposed, with ...
Cite
Journal ArticleJournal of the Royal Statistical Society Series B Statistical Methodology · April 1, 2007
The paper considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a non-parametric mixture of regression models, with t ...
Full textCite
Journal ArticleStatistics in medicine · April 2007
With societal trends towards increasing age at starting a pregnancy attempt, many women are concerned about achieving conception before the onset of infertility, which precedes menopause. Couples failing to conceive a pregnancy within 12 months are classif ...
Full textCite
Journal ArticleEpidemiology (Cambridge, Mass.) · March 2007
Studies that include individuals with multiple highly correlated exposures are common in epidemiology. Because standard maximum likelihood techniques often fail to converge in such instances, hierarchical regression methods have seen increasing use. Bayesi ...
Full textCite
Journal ArticleAmerican journal of epidemiology · January 2007
The relation between physical activity and uterine leiomyomata (fibroids) has received little study, but exercise is protective for breast cancer, another hormonally mediated tumor. Participants in this study were randomly selected members of a health plan ...
Full textCite
Journal ArticleBiometrics · December 2006
Many biomedical studies collect data on times of occurrence for a health event that can occur repeatedly, such as infection, hospitalization, recurrence of disease, or tumor onset. To analyze such data, it is necessary to account for within-subject depende ...
Full textCite
Journal ArticlePaediatric and perinatal epidemiology · November 2006
There is increasing interest in identifying predictors of human fertility, including environmental exposures, behavioural factors, and biomarkers, such as mucus or reproductive hormones. Epidemiological studies typically measure fecundability, the per mens ...
Full textCite
Journal ArticleBiostatistics (Oxford, England) · October 2006
Studies of latent traits often collect data for multiple items measuring different aspects of the trait. For such data, it is common to consider models in which the different items are manifestations of a normal latent variable, which depends on covariates ...
Full textCite
Journal ArticleJournal of Statistical Planning and Inference · September 1, 2006
We examine the effects of modelling errors, such as underfitting and overfitting, on the asymptotic power of tests of association between an explanatory variable x and an outcome in the setting of generalized linear models. The regression function for x is ...
Full textCite
Journal ArticleBiometrics · June 2006
The generalized linear mixed model (GLMM), which extends the generalized linear model (GLM) to incorporate random effects characterizing heterogeneity among subjects, is widely used in analyzing correlated and longitudinal data. Although there is often int ...
Full textCite
Journal ArticleEuropean journal of obstetrics, gynecology, and reproductive biology · March 2006
ObjectiveTo provide estimates of the probabilities of conception according to vulvar mucus observations classified by the woman on the day of intercourse.Study designProspective cohort study of 193 outwardly healthy Italian women using th ...
Full textCite
Journal ArticleJournal of the Society for Gynecologic Investigation · February 2006
ObjectiveHuman chorionic gonadotropin (hCG) has proliferative effects on uterine smooth muscle and leiomyoma tissue in vitro. We hypothesized that luteinizing hormone (LH) would have the same effect by activating the LH/hCG receptor, and it would ...
Full textCite
Chapter · January 1, 2006
In recent years there has been increasing concern that human exposure to environmental agents may disrupt the endocrine system and alter reproduction. For example, some studies have observed secular declines in semen quality over the past 50 years, and sev ...
Full textCite
Journal ArticleBiometrics · December 2005
In regression applications with categorical predictors, interest often focuses on comparing the null hypothesis of homogeneity to an ordered alternative. This article proposes a Bayesian approach for addressing this problem in the setting of normal linear ...
Full textCite
Journal ArticleHandbook of Statistics · December 1, 2005
With the rapid increase in biomedical technology and the accompanying generation of complex and high-dimensional data sets, Bayesian statistical methods have become much more widely used. One reason is that the Bayesian probability modeling machinery provi ...
Full textCite
Journal ArticleBiometrics · September 2005
In longitudinal studies and in clustered situations often binary and continuous response variables are observed and need to be modeled together. In a recent publication Dunson, Chen, and Harry (2003, Biometrics 59, 521-530) (DCH) propose a Bayesian approac ...
Full textCite
Journal ArticleBiometrika · September 1, 2005
In many applications, researchers are interested in estimating the mean of a multivariate normal random vector whose components are subject to order restrictions. Various authors have demonstrated that the likelihood-based methodology may perform poorly un ...
Full textCite
Journal ArticleAmerican journal of epidemiology · September 2005
Polychlorinated biphenyls (PCBs), once used widely in transformers and other applications, and 1,1-dichloro-2,2-bis(p-chlorophenyl)ethylene (DDE), the main metabolite of the pesticide 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT), are hormonally acti ...
Full textCite
Journal ArticleBiostatistics (Oxford, England) · July 2005
Samples of curves are collected in many applications, including studies of reproductive hormone levels in the menstrual cycle. Many approaches have been proposed for correlated functional data of this type, including smoothing spline methods and other flex ...
Full textCite
Journal ArticleJournal of the American Statistical Association · June 1, 2005
This article proposes a semiparametric Bayesian approach for inference on an unknown isotonic regression function, f(x), characterizing the relationship between a continuous predictor, X, and a count response variable, Y, adjusting for covariates, Z. A Dir ...
Full textCite
Journal ArticleLifetime data analysis · June 2005
Although Cox proportional hazards regression is the default analysis for time to event data, there is typically uncertainty about whether the effects of a predictor are more appropriately characterized by a multiplicative or additive model. To accommodate ...
Full textCite
Journal ArticleObstetrics and gynecology · April 2005
ObjectiveCervical mucus is vital in the regulation of sperm survival and transport through the reproductive tract. The goal of this study is to assess whether the lowered fertility for men in their late 30s and early 40s is related to the nature o ...
Full textCite
Journal ArticleJournal of Nonparametric Statistics · April 1, 2005
Suppose data consist of a random sample from a distribution function F Y, which is unknown, and that interest focuses on inferences on θ, a vector of quantiles of FY. When the likelihood function is not fully specified, a posterior de ...
Full textCite
Journal ArticleBiometrics · March 2005
Reproductive scientists and couples attempting pregnancy are interested in identifying predictors of the day-specific probabilities of conception in relation to the timing of a single intercourse act. Because most menstrual cycles have multiple days of int ...
Full textCite
Journal ArticleEnvironmental research · February 2005
Use of 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) continues in about 25 countries. This use has been justified partly by the belief that it has no adverse consequences on human health. Evidence has been increasing, however, for adverse reproductiv ...
Full textCite
Journal ArticleBiostatistics (Oxford, England) · January 2005
In studies of complex health conditions, mixtures of discrete outcomes (event time, count, binary, ordered categorical) are commonly collected. For example, studies of skin tumorigenesis record latency time prior to the first tumor, increases in the number ...
Full textCite
Journal ArticleBiometrics · December 2004
Researchers often measure stress using questionnaire data on the occurrence of potentially stress-inducing life events and the strength of reaction to these events, characterized as negative or positive and assigned an ordinal ranking. In studying the heal ...
Full textCite
Journal ArticleBiometrics · September 2004
Bayesian analyses of multivariate binary or categorical outcomes typically rely on probit or mixed effects logistic regression models that do not have a marginal logistic structure for the individual outcomes. In addition, difficulties arise when simple no ...
Full textLink to itemCite
Journal ArticleBiometrics · September 2004
In studying rates of occurrence and progression of lesions (or tumors), it is typically not possible to obtain exact onset times for each lesion. Instead, data consist of the number of lesions that reach a detectable size between screening examinations, al ...
Full textCite
Journal ArticleHuman reproduction (Oxford, England) · July 2004
BackgroundIntercourse in mammals is often coordinated with ovulation, for example through fluctuations in libido or by the acceleration of ovulation with intercourse. Such coordination has not been established in humans. We explored this possibili ...
Full textCite
Journal ArticleBiometrics · June 2004
In multivariate survival analysis, investigators are often interested in testing for heterogeneity among clusters, both overall and within specific classes. We represent different hypotheses about the heterogeneity structure using a sequence of gamma frail ...
Full textCite
Journal ArticleLifetime data analysis · June 2004
When estimating the distributions of two random variables, X and Y, investigators often have prior information that Y tends to be bigger than X. To formalize this prior belief, one could potentially assume stochastic ordering between X and Y, which implies ...
Full textCite
Journal ArticleBiometrics · June 2004
In many applications, the mean of a response variable can be assumed to be a nondecreasing function of a continuous predictor, controlling for covariates. In such cases, interest often focuses on estimating the regression function, while also assessing evi ...
Full textCite
Journal ArticleHuman reproduction (Oxford, England) · April 2004
BackgroundIntercourse results in a pregnancy essentially only if it occurs during the 6-day fertile interval ending on the day of ovulation. The strong association between timing of intercourse within this interval and the probability of conceptio ...
Full textCite
Journal ArticleObstetrics and gynecology · January 2004
ObjectiveTo estimate the effects of aging on the percentage of outwardly healthy couples who are sterile (completely unable to conceive without assisted reproduction) or infertile (unable to conceive within a year of unprotected intercourse).M ...
Full textCite
Journal ArticleEnvironmental health perspectives · January 2004
Although there has been growing concern about the effects of environmental exposures on human fertility, standard epidemiologic study designs may not collect sufficient data to identify subtle effects while properly adjusting for confounding. In particular ...
Full textOpen AccessCite
Journal ArticleArsenic Exposure and Health Effects V · December 18, 2003
Epidemiological studies indicate that inorganic arsenicals produce various skin lesions as well as skin, lung, bladder, liver, prostate, and renal cancer. Our laboratory previously demonstrated that low-dose 12-O-tetradecanoylphorbol-13-acetate (TPA) incre ...
Full textCite
Journal ArticleBiometrics · December 2003
We address the important practical problem of how to select the random effects component in a linear mixed model. A hierarchical Bayesian model is used to identify any random effect with zero variance. The proposed approach reparameterizes the mixed model ...
Full textCite
Journal ArticleBiometrics · December 2003
In studying the relationship between an ordered categorical predictor and an event time, it is standard practice to include dichotomous indicators of the different levels of the predictor in a Cox model. One can then use a multiple degree-of-freedom score ...
Full textCite
Journal ArticleJournal of the American Statistical Association · September 1, 2003
This article presents a new approach for analysis of multidimensional longitudinal data, motivated by studies using an item response battery to measure traits of an individual repeatedly over time. A general modeling framework is proposed that allows mixtu ...
Full textCite
Journal ArticleBiometrics · September 2003
In applications that involve clustered data, such as longitudinal studies and developmental toxicity experiments, the number of subunits within a cluster is often correlated with outcomes measured on the individual subunits. Analyses that ignore this depen ...
Full textCite
Journal ArticleBiometrics · June 2003
In biomedical studies, there is often interest in assessing the association between one or more ordered categorical predictors and an outcome variable, adjusting for covariates. For a k-level predictor, one typically uses either a k-1 degree of freedom (df ...
Full textCite
Journal ArticleBiometrics · June 2003
Often a response of interest cannot be measured directly and it is necessary to rely on multiple surrogates, which can be assumed to be conditionally independent given the latent response and observed covariates. Latent response models typically assume tha ...
Full textCite
Journal ArticleObstetrics and gynecology · June 2003
ObjectiveTo assess the day-specific and cycle-specific probabilities of conception leading to clinical pregnancy, in relation to the timing of intercourse and vulvar mucus observations.MethodsThis was a retrospective cohort study of women ...
Full textCite
Journal ArticleMathematical Population Studies · April 1, 2003
Information on the timing of intercourse relative to ovulation can be incorporated into time to pregnancy models to improve the power to detect covariate effects, to estimate the day-specific conception probabilities, and to distinguish between biological ...
Full textCite
Journal ArticleEpidemiology (Cambridge, Mass.) · March 2003
Uterine fibroids are benign tumors, the etiology of which is not understood. Symptoms can be debilitating, and the primary treatment is surgery, usually hysterectomy. Epidemiologic data show that pregnancy is associated with reduced risk of fibroids. We hy ...
Full textCite
Journal ArticleBiometrics · March 2003
In epidemiologic studies, there is often interest in assessing the association between exposure history and disease incidence. For many diseases, incidence may depend not only on cumulative exposure, but also on the ages at which exposure occurred. This ar ...
Full textCite
Journal ArticleJournal of the American Statistical Association · March 1, 2003
Cervical mucus hydration increases during the fertile interval before ovulation. Because sperm can only penetrate mucus having a high water content, cervical secretions provide a reliable marker of the fertile days of the menstrual cycle. This article deve ...
Full textCite
Journal ArticleThe Journal of allergy and clinical immunology · February 2003
BackgroundAsthma prevalence has increased dramatically in recent years, especially among children. Breast-feeding might protect children against asthma and related conditions (recurrent wheeze), and this protective effect might depend on the durat ...
Full textCite
Journal ArticleAmerican journal of obstetrics and gynecology · January 2003
ObjectiveUterine leiomyoma, or fibroid tumors, are the leading indication for hysterectomy in the United States, but the proportion of women in whom fibroid tumors develop is not known. This study screened for fibroid tumors, independently of clin ...
Full textCite
Journal ArticleBiometrics · December 2002
In the absence of longitudinal data, the current presence and severity of disease can be measured for a sample of individuals to investigate factors related to disease incidence and progression. In this article, Bayesian discrete-time stochastic models are ...
Full textCite
Journal ArticleHuman and Ecological Risk Assessment · October 1, 2002
Substantial improvements in dose response modeling for risk assessment may result from recent and continuing advances in biological research, biochemical techniques, biostatistical/mathematical methods and computational power. This report provides a ranked ...
Full textCite
Journal ArticleCancer research · June 2002
Nonsteroidal anti-inflammatory drugs are widely reported to inhibit carcinogenesis in humans and in rodents. These drugs are believed to act by inhibiting one or both of the known isoforms of cyclooxygenase (COX). However, COX-2, and not COX-1, is the isof ...
Cite
Journal ArticleHuman reproduction (Oxford, England) · May 2002
BackgroundMost analyses of age-related changes in fertility cannot separate effects due to reduced frequency of sexual intercourse from effects directly related to ageing. Information on intercourse collected daily through each menstrual cycle pro ...
Full textCite
Journal ArticleBiometrics · March 2002
To assess the protective effects of a time-varying covariate, we develop a stochastic model based on tumor biology. The model assumes that individuals have a Poisson-distributed pool of initiated clones, which progress through predetectable, detectable mor ...
Full textCite
Journal ArticleBiometrics · March 2002
Multivariate current status data, consist of indicators of whether each of several events occur by the time of a single examination. Our interest focuses on inferences about the joint distribution of the event times. Conventional methods for analysis of mu ...
Full textCite
Journal ArticleNucleic acids research · January 2002
Using a lacZ plasmid transgenic mouse model, spectra of spontaneous point mutations were determined in brain, heart, liver, spleen and small intestine in young and old mice. While similar at a young age, the mutation spectra among these organs were signifi ...
Full textCite
Journal ArticleBiometrics · December 2001
Time to pregnancy studies that identify ovulation days and collect daily intercourse data can be used to estimate the day-specific probabilities of conception given intercourse on a single day relative to ovulation. In this article, a Bayesian semiparametr ...
Full textCite
Journal ArticleHuman reproduction (Oxford, England) · November 2001
BackgroundThe TwoDay Algorithm is a simple method for identifying the fertile window. It classifies a day as fertile if cervical secretions are present on that day or were present on the day before. This approach may be an effective alternative to ...
Full textCite
Journal ArticleJAMA · October 2001
ContextPregnancy test kits routinely recommend testing "as early as the first day of the missed period." However, a pregnancy cannot be detected before the blastocyst implants. Due to natural variability in the timing of ovulation, implantation do ...
Full textCite
Journal ArticleJ Infect Dis · July 15, 2001
Many human immunodeficiency virus (HIV)-infected persons receive prolonged treatment with DNA-reactive antiretroviral drugs. A prospective study was conducted of 26 HIV-infected men who provided samples before treatment and at multiple times after beginnin ...
Full textLink to itemCite
Journal ArticleAmerican journal of epidemiology · June 2001
In the past decade, there have been enormous advances in the use of Bayesian methodology for analysis of epidemiologic data, and there are now many practical advantages to the Bayesian approach. Bayesian models can easily accommodate unobserved variables s ...
Full textCite
Journal ArticleBiometrics · June 2001
In some cross-sectional studies of chronic disease, data consist of the age at examination, whether the disease was present at the exam, and recall of the age at first diagnosis. This article describes a flexible parametric approach for combining current s ...
Full textCite
Journal ArticleToxicology letters · May 2001
The Tg.AC mouse carrying the v-Ha-ras structural gene is a useful model for the study of chemical carcinogens, especially those acting via non-genotoxic mechanisms. This study evaluated the efficacy of the non-toxic, water-soluble antioxidant from spinach, ...
Full textCite
Journal ArticleContraception · April 2001
Emergency post-coital contraceptives effectively reduce the risk of pregnancy, but their degree of efficacy remains uncertain. Measurement of efficacy depends on the pregnancy rate without treatment, which cannot be measured directly. We provide indirect e ...
Full textCite
Journal ArticleStatistics in medicine · March 2001
In modelling human fertility one ideally accounts for timing of intercourse relative to ovulation. Measurement error in identifying the day of ovulation can bias estimates of fecundability parameters and attenuate estimates of covariate effects. In the abs ...
Full textCite
Journal ArticleJournal of Agricultural Biological and Environmental Statistics · March 1, 2001
Skin painting studies on transgenic mice have recently been approved by the Food and Drug Administration (FDA) for carcinogenicity testing. Data consist of serial skin tumor counts on the backs of shaved mice in each of several dose groups. Current methods ...
Full textCite
Journal ArticleBiometrics · March 2001
This article describes a general class of factor analytic models for the analysis of clustered multivariate data in the presence of informative missingness. We assume that there are distinct sets of cluster-level latent variables related to the primary out ...
Full textCite
Chapter · January 1, 2001
One of the pleasures of working as an applied statistician is the awareness it brings of the wide diversity of scientific fields to which our profession contributes critical concepts and methods. My own awareness was enhanced by accepting the invitation fr ...
Cite
Journal ArticleJournal of the Royal Statistical Society Series C Applied Statistics · January 1, 2001
Statistical inference about tumorigenesis should focus on the tumour incidence rate. Unfortunately, in most animal carcinogenicity experiments, tumours are not observable in live animals and censoring of the tumour onset times is informative. In this paper ...
Full textCite
Journal ArticleBiometrics · December 2000
In some types of cancer chemoprevention experiments and short-term carcinogenicity bioassays, the data consist of the number of observed tumors per animal and the times at which these tumors were first detected. In such studies, there is interest in distin ...
Full textCite
Journal ArticleJournal of the American Statistical Association · December 1, 2000
There is increasing evidence that exposure to environmental toxins during key stages of development can disrupt the human reproductive system. Such effects have proven difficult to study due to the many behavioral and biological factors involved in human r ...
Full textCite
Journal ArticleGenetics · November 2000
Studies that examine both the frequency of gene mutation and the pattern or spectrum of mutational changes can be used to identify chemical mutagens and to explore the molecular mechanisms of mutagenesis. In this article, we propose a Bayesian hierarchical ...
Full textCite
Journal ArticleBMJ (Clinical research ed.) · November 2000
ObjectivesTo provide specific estimates of the likely occurrence of the six fertile days (the "fertile window") during the menstrual cycle.DesignProspective cohort study.Participants221 healthy women who were planning a pregnancy ...
Full textCite
Journal ArticleRisk analysis : an official publication of the Society for Risk Analysis · August 2000
Toxicologists are often interested in assessing the joint effect of an exposure on multiple reproductive endpoints, including early loss, fetal death, and malformation. Exposures that occur prior to mating or extremely early in development can adversely af ...
Full textCite
Journal ArticleToxicological sciences : an official journal of the Society of Toxicology · June 2000
New strategies for identifying chemical carcinogens and assessing risk have been proposed based on the Tg.AC (zetaglobin promoted v-Ha-ras) transgenic mouse. Preliminary studies suggest that the Tg. AC mouse bioassay may be an effective means of quickly ev ...
Full textCite
Journal ArticleBiometrics · March 2000
The probability of conception in a given menstrual cycle is closely related to the timing of intercourse relative to ovulation. Although commonly used markers of time of ovulation are known to be error prone, most fertility models assume the day of ovulati ...
Full textCite
Journal ArticleStatistics in Medicine · 2000
In prospective studies of human fertility that attempt to identify days of ovulation, couples record each day whether they had intercourse. Depending on the design of the study, couples either (I) mark the dates of intercourse on a chart or (II) mark 'yes' ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society Series B Statistical Methodology · January 1, 2000
A general framework is proposed for modelling clustered mixed outcomes. A mixture of generalized linear models is used to describe the joint distribution of a set of underlying variables, and an arbitrary function relates the underlying variables to the ob ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society Series C Applied Statistics · January 1, 2000
In cancer studies that use transgenic or knockout mice, skin tumour counts are recorded over time to measure tumorigenicity. In these studies cancer biologists are interested in the effect of endogenous and/or exogenous factors on papilloma onset, multipli ...
Full textCite
Journal ArticleBiometrics · September 1999
We describe a method for modeling carcinogenicity from animal studies where the data consist of counts of the number of tumors present over time. The research is motivated by applications to transgenic rodent studies, which have emerged as an alternative t ...
Full textCite
Journal ArticleCopeia · August 2, 1999
We measured growth and survival in field enclosures of juvenile Rivulus marmoratus under a variety of biotic (effects of body mass and intraspecific density) and abiotic conditions (seasonal climatic changes, site-specific hypoxia). We also tested three di ...
Full textCite
Journal ArticleHuman reproduction (Oxford, England) · July 1999
Two studies have related the timing of sexual intercourse (relative to ovulation) to day-specific fecundability. The first was a study of Catholic couples practising natural family planning in London in the 1950s and 1960s and the second was of North Carol ...
Full textCite
Journal ArticleBiometrics · June 1999
Proper characterization of the motion of spermatozoa is an important prerequisite for interpreting differences in sperm motility that might arise from exposure to toxicants. Patterns of sperm movement can be extremely complex. On the basis of an exponentia ...
Full textCite
Journal ArticleBiometrics · June 1998
This paper proposes a method for assessing risk in developmental toxicity studies with exposure prior to implantation. The method proposed in this paper was developed to account for a dose-dependent trend in the number of implantation sites per dam, which ...
Full textCite