Journal ArticleTechnometrics · January 1, 2025
Motivated by applications to water quality monitoring using fluorescence spectroscopy, we develop the source apportionment model for high dimensional profiles of dissolved organic matter (DOM). We describe simple methods to estimate the parameters of a lin ...
Full textCite
Journal ArticleJournal of Survey Statistics and Methodology · November 1, 2024
Existing methods for small-area data involve a trade-off between maintaining area-level frequentist coverage rates and improving precision via the incorporation of indirect information. In this article, we develop an area-level prediction region procedure ...
Full textCite
Journal ArticleEnvironmetrics · November 1, 2024
Spatial models for areal data are often constructed such that all pairs of adjacent regions are assumed to have near-identical spatial autocorrelation. In practice, data can exhibit dependence structures more complicated than can be represented under this ...
Full textCite
Journal ArticleAnnals of Applied Statistics · September 1, 2024
Directional relational event data, such as email data, often contain unicast messages (i.e., messages of one sender toward one receiver) and multicast messages (i.e., messages of one sender toward multiple receivers). The Enron email data that is the focus ...
Full textCite
Journal ArticleJournal of Computational and Graphical Statistics · January 1, 2024
In many regression settings the unknown coefficients may have some known structure, for instance they may be ordered in space or correspond to a vectorized matrix or tensor. At the same time, the unknown coefficients may be sparse, with many nearly or exac ...
Full textCite
Journal ArticleBiometrika · December 1, 2023
The Fréchet mean generalizes the concept of a mean to a metric space setting. In this work we consider equivariant estimation of Fréchet means for parametric models on metric spaces that are Riemannian manifolds. The geometry and symmetry of such a space a ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series B: Statistical Methodology · November 1, 2023
A separable covariance model can describe the among-row and among-column correlations of a random matrix and permits likelihood-based inference with a very small sample size. However, if the assumption of separability is not met, data analysis with a separ ...
Full textCite
Journal ArticleCanadian Journal of Statistics · September 1, 2023
In multigroup data settings with small within-group sample sizes, standard (Formula presented.) -tests of group-specific linear hypotheses can have low power, particularly if the within-group sample sizes are not large relative to the number of explanatory ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series B: Statistical Methodology · September 1, 2023
n the 1930s, Psychologists began developing Multiple-Factor Analysis to decompose multivariate data into a small number of interpretable factors without any a priori knowledge about those factors. In this form of factor analysis, the Varimax factor rotatio ...
Full textCite
Journal ArticleACS ES and T Water · August 11, 2023
Dissolved organic matter (DOM) is an important component of the biogeochemistry and ecosystem function of streams and rivers. Unlike inorganic nitrogen nutrients, organic nitrogen (DON) nutrients can vary considerably depending on the contributions of vari ...
Full textCite
Journal ArticleBernoulli · May 1, 2023
This article illustrates how indirect or prior information can be optimally used to construct a prediction region that maintains a target frequentist coverage rate. If the indirect information is accurate, the volume of the prediction region is lower on av ...
Full textCite
Journal ArticleBiostatistics (Oxford, England) · December 2022
Medical research institutions have generated massive amounts of biological data by genetically profiling hundreds of cancer cell lines. In parallel, academic biology labs have conducted genetic screens on small numbers of cancer cell lines under custom exp ...
Full textOpen AccessCite
Journal ArticleAnnals of Statistics · December 1, 2022
The Fréchet mean is a useful description of location for a probability distribution on a metric space that is not necessarily a vector space. This article considers simultaneous estimation of multiple Fréchet means from a decision-theoretic perspective, an ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 1, 2022
This article develops p-values for evaluating means of normal populations that make use of indirect or prior information. A p-value of this type is based on a biased frequentist hypothesis test that has optimal average power with respect to a probability d ...
Full textCite
Journal ArticleAnnals of Statistics · October 1, 2021
In matrix-valued datasets the sampled matrices often exhibit correlations among both their rows and their columns. A useful and parsimonious model of such dependence is the matrix normal model, in which the covariances among the elements of a random matrix ...
Full textCite
Journal ArticleJournal of Computational and Graphical Statistics · January 1, 2021
Motivated by applications to Bayesian inference for statistical models with orthogonal matrix parameters, we present (Formula presented.) a general approach to Monte Carlo simulation from probability distributions on the Stiefel manifold. To bypass many of ...
Full textCite
Journal ArticleStatistical Science · January 1, 2021
Network datasets typically exhibit certain types of statistical patterns, such as within-dyad correlation, degree heterogeneity, and triadic patterns such as transitivity and clustering. The first two of these can be well represented with a social relation ...
Full textCite
Journal ArticleJournal of Survey Statistics and Methodology · April 1, 2020
In the analysis of survey data, it is of interest to estimate and quantify uncertainty about means or totals for each of several nonoverlapping subpopulations or areas. When the sample size for a given area is small, standard confidence intervals based on ...
Full textCite
Journal ArticleJournal of Computational and Graphical Statistics · January 2, 2020
Many penalized maximum likelihood estimators correspond to posterior mode estimators under specific prior distributions. Appropriateness of a particular class of penalty functions can therefore be interpreted as the appropriateness of a prior for the param ...
Full textCite
Journal ArticleBernoulli · January 1, 2020
Random orthogonal matrices play an important role in probability and statistics, arising in multivariate analysis, directional statistics, and models of physical systems, among other areas. Calculations involving random orthogonal matrices are complicated ...
Full textCite
Journal ArticleAnnals of Applied Statistics · January 1, 2020
Quantification of stylistic differences between musical artists is of academic interest to the music community and is also useful for other applications, such as music information retrieval and recommendation systems. Information about stylistic difference ...
Full textCite
Journal ArticleJournal of Machine Learning Research · October 1, 2019
We develop a model-based method for evaluating heterogeneity among several p × p covariance matrices in the large p, small n setting. This is done by assuming a spiked covariance model for each group and sharing information about the space spanned by the g ...
Cite
Journal ArticleComputational Statistics and Data Analysis · September 1, 2019
Consider the problem of estimating the entries of an unknown mean matrix or tensor given a single noisy realization. In the matrix case, this problem can be addressed by decomposing the mean matrix into a component that is additive in the rows and columns, ...
Full textCite
Journal ArticleJournal of Statistical Planning and Inference · July 1, 2019
In multiple testing scenarios, typically the sign of a parameter is inferred when its estimate exceeds some significance threshold in absolute value. Typically, the significance threshold is chosen to control the experimentwise type I error rate, family-wi ...
Full textCite
Journal ArticleSocial Networks · May 1, 2019
We introduce a simple and extendable coevolution model for the analysis of longitudinal network and nodal attribute data. The model features parameters that describe three phenomena: homophily, contagion and autocorrelation of the network and nodal attribu ...
Full textCite
Journal ArticlePolitical Analysis · April 1, 2019
We introduce a Bayesian approach to conduct inferential analyses on dyadic data while accounting for interdependencies between observations through a set of additive and multiplicative effects (AME). The AME model is built on a generalized linear modeling ...
Full textCite
Journal ArticleThe annals of applied statistics · March 2019
Health exams determine a patient's health status by comparing the patient's measurement with a population reference range, a 95% interval derived from a homogeneous reference population. Similarly, most of the established relation among health problems are ...
Full textCite
Journal ArticleElectronic Journal of Statistics · January 1, 2019
We propose an adaptive confidence interval procedure (CIP) for the coefficients in the normal linear regression model. This procedure has a frequentist coverage rate that is constant as a function of the model parameters, yet provides smaller intervals tha ...
Full textCite
Journal ArticleBiometrika · June 1, 2018
Commonly used interval procedures for multigroup data attain their nominal coverage rates across a population of groups on average, but their actual coverage rate for a given group will be above or below the nominal rate, depending on the group mean. While ...
Full textCite
Journal ArticleComputational Statistics and Data Analysis · November 1, 2017
Using a multiplicative reparametrization, it is shown that a subclass of Lq penalties with q less than or equal to one can be expressed as sums of L2 penalties. It follows that the lasso and other norm-penalized regression estimates may be obtained using a ...
Full textCite
Journal ArticleElectronic Journal of Statistics · January 1, 2017
Many applications involve estimation of a signal matrix from a noisy data matrix. In such cases, it has been observed that estimators that shrink or truncate the singular values of the data matrix perform well when the signal matrix has approximately low r ...
Full textCite
Journal ArticleJournal of Multivariate Analysis · December 1, 2016
Many inference techniques for multivariate data analysis assume that the rows of the data matrix are realizations of independent and identically distributed random vectors. Such an assumption will be met, for example, if the rows of the data matrix are mul ...
Full textCite
Journal ArticleLinear Algebra and Its Applications · September 15, 2016
We develop a higher-order generalization of the LQ decomposition and show that this decomposition plays an important role in likelihood-based estimation and testing for separable, or Kronecker structured, covariance models, such as the multilinear normal m ...
Full textCite
Journal ArticleJournal of Peace Research · May 1, 2016
Previous models of international conflict have suffered two shortfalls. They tend not to embody dynamic changes, focusing rather on static slices of behavior over time across a single relational dimension. These models have also been empirically evaluated ...
Full textCite
Journal ArticleBayesian Analysis · January 1, 2016
Analyses of array-valued datasets often involve reduced-rank array approximations, typically obtained via least-squares or truncations of array decompositions. However, least-squares approximations tend to be noisy in highdimensional settings, and may not ...
Full textCite
Journal ArticleThe annals of applied statistics · September 2015
A fundamental aspect of relational data, such as from a social network, is the possibility of dependence among the relations. In particular, the relations between members of one pair of nodes may have an effect on the relations between members of another p ...
Full textCite
Journal ArticleJournal of Multivariate Analysis · May 1, 2015
Inference about dependence in a multiway data array can be made using the array normal model, which corresponds to the class of multivariate normal distributions with separable covariance matrices. Maximum likelihood and Bayesian methods for inference in t ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series B, Statistical methodology · January 2015
Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a para ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2015
Relational data are often represented as a square matrix, the entries of which record the relationships between pairs of objects. Many statistical methods for the analysis of such data assume some degree of similarity or dependence between objects in terms ...
Full textCite
Journal ArticleJournal of the American Statistical Association · January 2015
Network analysis is often focused on characterizing the dependencies between network relations and node-level attributes. Potential relationships are typically explored by modeling the network as a function of the nodal attributes or by modeling the attrib ...
Full textCite
Journal ArticleThe annals of applied statistics · March 2014
ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can ...
Full textCite
Journal ArticleBernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability · January 2014
Often of primary interest in the analysis of multivariate data are the copula parameters describing the dependence among the variables, rather than the univariate marginal distributions. Since the ranks of a multivariate dataset are invariant to changes in ...
Full textCite
Journal ArticleThe annals of applied statistics · January 2014
Human mortality data sets can be expressed as multiway data arrays, the dimensions of which correspond to categories by which mortality rates are reported, such as age, sex, country and year. Regression models for such data typically assume an independent ...
Full textCite
Journal ArticleNetwork science (Cambridge University Press) · December 2013
Many studies that gather social network data use survey methods that lead to censored, missing, or otherwise incomplete information. For example, the popular fixed rank nomination (FRN) scheme, often used in studies of schools and businesses, asks study pa ...
Full textCite
Journal ArticleStat · December 1, 2013
According to classic game theory, individuals playing a centipede game learn about the subgame perfect Nash equilibrium via repeated play of the game. We employ statistical modeling to evaluate the evidence of such learning processes while accounting for t ...
Full textCite
Journal ArticleBayesian Analysis · June 10, 2013
Due to their great flexibility, nonparametric Bayes methods have proven to be a valuable tool for discovering complicated patterns in data. The term "nonparametric Bayes" suggests that these methods inherit model-free operating characteristics of classical ...
Full textCite
Journal ArticleClinical trials (London, England) · February 2013
BackgroundNovel dose-finding designs for Phase I cancer clinical trials, using estimation to assign the best estimated Maximum Tolerated Dose (MTD) at each point in the experiment, most prominently via Bayesian techniques, have been widely discuss ...
Full textCite
Journal ArticleStatistica Sinica · April 1, 2012
Classical regression analysis relates the expectation of a response variable to a linear combination of explanatory variables. In this article, we propose a covariance regression model that parameterizes the covariance matrix of a multivariate response vec ...
Full textCite
Journal ArticleJournal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America · January 2012
Network models are widely used in social sciences and genome sciences. The latent space model proposed by (Hoff et al. 2002), and extended by (Handcock et al. 2007) to incorporate clustering, provides a visually interpretable model-based spatial representa ...
Full textCite
Journal ArticleThe international journal of biostatistics · October 2011
It is common for novel dose-finding designs to be presented without a study of their convergence properties. In this article we suggest that examination of convergence is a necessary quality check for dose-finding designs. We present a new convergence proo ...
Full textCite
Journal ArticleBayesian Analysis · June 16, 2011
Modern datasets are often in the form of matrices or arrays, potentially having correlations along each set of data indices. For example, data involving repeated measurements of several variables over time may exhibit temporal correlation as well as correl ...
Full textCite
Journal ArticleBayesian Analysis · June 16, 2011
I thank the editor for the opportunity to expand upon the paper, and I thank the discussants for their insightful comments. In this rejoinder I elaborate on some of the topics from the discussion: the appropriateness of separable covariance models for arra ...
Full textCite
Journal ArticleAnnals of Applied Statistics · June 1, 2011
The focus of this paper is an approach to the modeling of longitudinal social network or relational data. Such data arise from measurements on pairs of objects or actors made at regular temporal intervals, resulting in a social network for each point in ti ...
Full textCite
Journal ArticleComputational Statistics and Data Analysis · January 1, 2011
Reduced-rank decompositions provide descriptions of the variation among the elements of a matrix or array. In such decompositions, the elements of an array are expressed as products of low-dimensional latent factors. This article presents a model-based ver ...
Full textCite
Journal ArticleModern pathology : an official journal of the United States and Canadian Academy of Pathology, Inc · December 2010
Approximately 10% of ulcerative colitis patients develop colorectal neoplasia. At present, identification of this subset is markedly limited and necessitates lifelong colonoscopic surveillance for the entire ulcerative colitis population. Better risk marke ...
Full textCite
ConferenceAdvances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference · December 1, 2009
This article discusses a latent variable model for inference and prediction of symmetric relational data. The model, based on the idea of the eigenvalue decomposition, represents the relationship between two nodes as the weighted inner-product of node-spec ...
Cite
Journal ArticleJournal of Computational and Graphical Statistics · December 1, 2009
Orthonormal matrices play an important role in reduced-rank matrix approximations and the analysis of matrix-valued data. A matrix Bingham-von Mises-Fisher distribution is a probability distribution on the set of orthonormal matrices that includes linear a ...
Full textCite
Journal ArticleComputational and Mathematical Organization Theory · December 1, 2009
We discuss a statistical model of social network data derived from matrix representations and symmetry considerations. The model can include known predictor information in the form of a regression term, and can represent additional structure via sender-spe ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series B: Statistical Methodology · November 1, 2009
Although the covariance matrices corresponding to different populations are unlikely to be exactly equal they can still exhibit a high degree of similarity. For example, some pairs of variables may be positively correlated across most groups, whereas the c ...
Full textCite
Journal ArticleSocial networks · July 2009
Social network data often involve transitivity, homophily on observed attributes, clustering, and heterogeneity of actor degrees. We propose a latent cluster random effects model to represent all of these features, and we describe a Bayesian estimation met ...
Full textCite
Journal ArticleStatistics in medicine · June 2009
The percentile-finding experimental design known variously as 'forced-choice fixed-staircase', 'geometric up-and-down' or 'k-in-a-row' (KR) was introduced by Wetherill four decades ago. To date, KR has been by far the most widely used up-and-down (U&D) des ...
Full textCite
Book · October 30, 2008
Using data over the period from 1950 to 2000, we estimate a model of bilateral international trade to explore the linkages between (a) alliances, (b) joint memberships in international institutions, (c) mutual cooperation and (d) conflict, (e) mutual econo ...
Full textCite
Journal ArticleJournal of the American Statistical Association · June 1, 2008
We propose a two-sided method to simultaneously estimate men's and women's preferences for relative age, education, and religious characteristics of potential mates using cross-sectional data on married couples and single individuals, in conjunction with a ...
Full textCite
ConferenceAdvances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference · January 1, 2008
This article discusses a latent variable model for inference and prediction of symmetric relational data. The model, based on the idea of the eigenvalue decomposition, represents the relationship between two nodes as the weighted inner-product of node-spec ...
Cite
ConferenceAdvances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference · January 1, 2008
This article discusses a latent variable model for inference and prediction of symmetric relational data. The model, based on the idea of the eigenvalue decomposition, represents the relationship between two nodes as the weighted inner-product of node-spec ...
Cite
Journal ArticleJournal of acquired immune deficiency syndromes (1999) · October 2007
ObjectiveTo assess the efficacy of a peer-delivered intervention to promote short-term (6-month) and long-term (12-month) adherence to HAART in a Mozambican clinic population.DesignA 2-arm randomized controlled trial was conducted between ...
Full textCite
Journal ArticleJournal of the American Statistical Association · June 1, 2007
Many multivariate data-analysis techniques for an m × n matrix Y are related to the model Y = M + E, where Y is an m × n matrix of full rank and M is an unobserved mean matrix of rank K < (m ∧ n). Typically the rank of M is estimated in a heuristic way and ...
Full textCite
Journal ArticleAIDS care · May 2007
Understanding sexual behavior and assessing transmission risk among people living with HIV-1 is crucial for effective HIV-1 prevention. We describe sexual behavior among HIV-positive persons initiating highly active antiretroviral therapy (HAART) in Beira, ...
Full textCite
Journal ArticleAIDS and behavior · March 2007
We explored methodological issues related to antiretroviral adherence assessment, using 6 months of data collected in a completed intervention trial involving 136 low-income HIV-positive outpatients in the Bronx, NY. Findings suggest that operationalizing ...
Full textCite
Journal ArticleJournal of Peace Research · 2007
The authors examine a standard gravity model of international commerce augmented to include political as well as institutional influences on bilateral trade. Using annual data from 1980-2001, they estimate regression coefficients and residual dependencies ...
Full textLink to itemCite
Journal ArticleBayesian Analysis · December 1, 2006
We discuss a model-based approach to identifying clusters of objects based on subsets of attributes, so that the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered. The method is based on a Póly ...
Full textCite
ConferenceMethodology · January 1, 2006
Recent advances in latent space and related random effects models hold much promise for representing network data. The inherent dependency between ties in a network makes modeling data of this type difficult. In this article we consider a recently develope ...
Full textCite
Journal ArticleBiometrics · December 2005
This article develops a model-based approach to clustering multivariate binary data, in which the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered. The clustering approach is based on a multiv ...
Full textCite
Journal ArticleJournal of the American Statistical Association · March 1, 2005
This article discusses the use of a symmetric multiplicative interaction effect to capture certain types of third-order dependence patterns often present in social networks and other dyadic datasets. Such an effect, along with standard linear fixed and ran ...
Full textCite
Journal ArticleProceedings of the National Academy of Sciences of the United States of America · June 2004
Inherited colorectal cancer syndromes in humans exhibit regional specificity for tumor formation. By using mice with germline mutations in the adenomatous polyposis coli gene (Apc) and/or DNA mismatch repair genes, we have analyzed the genetic control of t ...
Full textCite
Journal ArticlePolitical Analysis · 2004
Despite the desire to focus on the interconnected nature of politics and economics at the global scale, most empirical studies in the field of international relations assume not only that the major actors are sovereign, but also that their relationships ar ...
Full textLink to itemCite
Journal ArticleBiometrika · June 1, 2003
We discuss two methods of making nonparametric Bayesian inference on probability measures subject to a partial stochastic ordering. The first method involves a nonparametric prior for a measure on partially ordered latent observations, and the second invol ...
Full textCite
Journal ArticleAnnals of Statistics · February 1, 2003
We present a general approach to estimating probability measures constrained to lie in a convex set. We represent constrained measures as mixtures of simple, known extreme measures, and so the problem of estimating a constrained measure becomes one of esti ...
Full textCite
Journal ArticleJournal of the American Statistical Association · December 1, 2002
Network models are widely used to represent relational information among interacting units. In studies of social networks, recent emphasis has been placed on random graph models where the nodes usually represent individual social actors and the edges repre ...
Full textCite
Journal ArticleProceedings of the National Academy of Sciences of the United States of America · March 2000
The interaction between mutations in the tumor-suppressor genes Apc and p53 was studied in congenic mouse strains to minimize the influence of polymorphic modifiers. The multiplicity and invasiveness of intestinal adenomas of Apc(Min/+) (Min) mice was enha ...
Full textCite
Journal ArticleJournal of Computational and Graphical Statistics · January 1, 2000
This article discusses a new technique for calculating maximum likelihood estimators (MLEs) of probability measures when it is assumed the measures are constrained to a compact, convex set. Measures in such sets can be represented as mixtures of simple, kn ...
Full textCite
Journal ArticleJournal of Computational and Graphical Statistics · January 1, 2000
This article discusses a new technique for calculating maximum likelihood estimators (MLEs) of probability measures when it is assumed the measures are constrained to a compact, convex set. Measures in such sets can be represented as mixtures of simple, kn ...
Full textCite
Journal ArticleGenetics · May 1998
We have used a rat model of induced mammary carcinomas in an effort to identify breast cancer susceptibility genes. Using genetic crosses between the carcinoma-resistant Copenhagen (COP) and carcinoma-sensitive Wistar-Furth rats, we have confirmed the iden ...
Full textCite