Journal ArticleEpidemiology (Cambridge, Mass.) · November 2024
Surveys are commonly used to facilitate research in epidemiology, health, and the social and behavioral sciences. Often, these surveys are not simple random samples, and respondents are given weights reflecting their probability of selection into the surve ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society Series A: Statistics in Society · October 23, 2024
AbstractMany population surveys do not provide information on respondents’ residential addresses, instead offering coarse geographies like zip code or higher aggregations. However, fine resolution geography ...
Full textCite
Journal ArticleJournal of Privacy and Confidentiality · August 27, 2024
We describe the launching of the Society for Privacy and Confidentiality Research (SPCR). SPCR is the new owner of the Journal of Privacy and Confidentiality, with the goal of ensuring a sustainable future for the Journal, and continuing to publish the mul ...
Full textCite
Journal ArticleAmerican journal of epidemiology · August 2024
There is a profound need to identify modifiable risk factors to screen and prevent pancreatic cancer. Air pollution, including fine particulate matter (PM2.5), is increasingly recognized as a risk factor for cancer. We conducted a case-control study using ...
Full textCite
Book · April 25, 2024
Protecting privacy and ensuring confidentiality in data is a critical component of modernizing our national data infrastructure. The use of blended data - combining previously collected data sources - presents new considerations for responsible data stewar ...
Full textCite
Journal ArticleJournal of statistical planning and inference · March 2024
We consider causal inference for observational studies with data spread over two files. One file includes the treatment, outcome, and some covariates measured on a set of individuals, and the other file includes additional causally-relevant covariates meas ...
Full textCite
Journal ArticleJournal of Survey Statistics and Methodology · February 1, 2024
In many survey settings, population counts or percentages are available for some of the variables in the survey, for example, from censuses, administrative databases, or other high-quality surveys. We present a model-based approach to utilize such auxiliar ...
Full textCite
Journal ArticleAmerican Statistician · January 1, 2024
Data stewards and analysts can promote transparent and trustworthy science and policy-making by facilitating assessments of the sensitivity of published results to alternate analysis choices. For example, researchers may want to assess whether the results ...
Full textCite
Journal ArticleProceedings of the National Academy of Sciences of the United States of America · October 2023
The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data prod ...
Full textCite
Journal ArticleTransactions on Data Privacy · January 1, 2023
When initially proposed, synthetic data for disclosure control was generally dismissed as unlikely to be implemented in practice. Thirty years later, synthetic data are becoming a staple of the disclosure limitation toolkit. We now see synthetic public use ...
Cite
Journal ArticleJournal of Privacy and Confidentiality · July 29, 2022
Several algorithms exist for creating differentially private counts from contingency tables, such as two-way or three-way marginal counts. The resulting noisy counts generally do not correspond to a coherent contingency table, so that some post-processing ...
Full textCite
Journal ArticleJournal of Survey Statistics and Methodology · June 1, 2022
Recently, several organizations have considered using differentially private algorithms for disclosure limitation when releasing count data. The typical approach is to add random noise to the counts sampled from, for example, a Laplace distribution or symm ...
Full textCite
Journal ArticleAmerican Statistician · January 1, 2022
We consider settings where an analyst of multiply imputed data desires an integer-valued point estimate and an associated interval estimate, for example, a count of the number of individuals with certain characteristics in a population. Even when the point ...
Full textCite
Journal ArticleBayesian Analysis · January 1, 2022
In some scenarios, the observational data needed for causal inferences are spread over two data files. In particular, we consider scenarios where one file includes covariates and the treatment measured on a set of individuals, and a second file includes re ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series A, (Statistics in Society) · April 2021
Often, government agencies and survey organizations know the population counts or percentages for some of the variables in a survey. These may be available from auxiliary sources, for example, administrative databases or other high quality surveys. We pres ...
Full textCite
Journal ArticleTransactions on Data Privacy · April 1, 2021
To reduce disclosure risks, statistical agencies and other organizations can release noisy counts that satisfy differential privacy. In some contexts, the released counts satisfy additive con-straints; for example, the released value of a total should equa ...
Cite
Journal ArticleJournal of Statistical Computation and Simulation · January 1, 2021
Baseline covariates in randomized experiments are often used in the estimation of treatment effects, for example, when estimating treatment effects within covariate-defined subgroups. In practice, however, covariate values may be missing for some data subj ...
Full textCite
Book · January 1, 2021
ADMINISTRATIVE RECORDS FOR SURVEY METHODOLOGY Addresses the international use of administrative records for large-scale surveys, censuses, and other statistical purposes. Administrative Records for Survey Methodology is a comprehensive guide to improving t ...
Full textCite
Chapter · January 1, 2021
Linking subjects in a planned medical study to records in administrative databases, such as electronic health records and Medicare claims data, can enable researchers to evaluate long-term outcomes, as well as outcomes not measured in the planned study, wi ...
Full textCite
Journal ArticleJournal of Statistical Theory and Practice · September 1, 2020
We present a Bayesian mixture model for estimating the joint distribution of mixed ordinal, nominal, and continuous data conditional on a set of fixed variables. The modeling strategy is motivated by applied contexts in marketing and the social sciences, i ...
Full textCite
Journal ArticleAmerican Statistician · April 2, 2020
We present a Wilson interval for binomial proportions for use with multiple imputation for missing data. Using simulation studies, we show that it can have better repeated sampling properties than the usual confidence interval for binomial proportions base ...
Full textCite
Journal ArticleJ Palliat Med · January 2020
Background: Hospital referral regions (HRRs) are often used to characterize inpatient referral patterns, but it is unknown how well these geographic regions are aligned with variation in Medicare-financed hospice care, which is largely provided at home. Ob ...
Full textLink to itemCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2020
Often data analysts use probabilistic record linkage techniques to match records across two data sets. Such matching can be the primary goal, or it can be a necessary step to analyze relationships among the variables in the data sets. We propose a Bayesian ...
Full textCite
Journal ArticleBiometrika · December 1, 2019
We study a class of missingness mechanisms, referred to as sequentially additive nonignorable, for modelling multivariate data with item nonresponse. These mechanisms explicitly allow the probability of nonresponse for each variable to depend on the value ...
Full textCite
Journal ArticleJournal of Survey Statistics and Methodology · December 1, 2019
The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistic ...
Full textCite
Journal ArticleJournal of Survey Statistics and Methodology · June 1, 2019
Often in surveys, key items are subject to measurement errors. Given just the data, it can be difficult to determine the extent and distribution of this error process and, hence, to obtain accurate inferences that involve the error-prone variables. In some ...
Full textCite
Report · April 3, 2019
Many data producers seek to provide users access to confidential data without unduly compromising data subjects’ privacy and confidentiality. One general strategy is to require users to do analyses without seeing the confidential data; for example, analyst ...
Full textCite
Journal ArticleAnnual Review of Statistics and Its Application · March 7, 2019
Federal statistics agencies strive to release data products that are informative for many purposes, yet also protect the privacy and confidentiality of data subjects' identities and sensitive attributes. This article reviews the role that differential priv ...
Full textCite
Journal ArticleTransactions on Data Privacy · December 1, 2018
One approach for releasing public use files is to make synthetic data, i.e., data simulated from statistical models estimated on the confidential data. Given access only to synthetic data, users cannot tell whether the synthetic data have been constructed ...
Cite
Journal ArticleJournal of Computational and Graphical Statistics · October 2, 2018
Many analyses require linking records from two databases comprising overlapping sets of individuals. In the absence of unique identifiers, the linkage procedure often involves matching on a set of categorical variables, such as demographics, common to both ...
Full textCite
Journal ArticleStatistics in medicine · October 2018
We develop methodology for causal inference in observational studies when using propensity score subclassification on data constructed with probabilistic record linkage techniques. We focus on scenarios where covariates and binary treatment assignments are ...
Full textCite
Journal ArticleStatistica Sinica · October 1, 2018
With nonignorable missing data, likelihood-based inference should be based on the joint distribution of the study variables and their missingness indicators. These joint models cannot be estimated from the data alone, thus requiring the analyst to impose r ...
Full textCite
Journal ArticleJ Palliat Med · August 2018
BACKGROUND: Use of the Medicare hospice benefit has been associated with high-quality care at the end of life, and hospice length of use in particular has been used as a proxy for appropriate timing of hospice enrollment. Quantile regression has been under ...
Full textLink to itemCite
Journal ArticleReview of Economics and Statistics · July 1, 2018
In the U.S. Census Bureau's 2002 and 2007 Censuses of Manufactures, 79% and 73% of observations, respectively, have imputed data for at least one variable used to compute total factor productivity (TFP). The bureau primarily imputes for missing values usin ...
Full textCite
Journal ArticleREVSTAT-Statistical Journal · April 1, 2018
We present a joint modeling approach for multiple imputation of missing continuous and categorical variables using Bayesian mixture models. The approach extends the idea of focused clustering, in which one separates variables into two sets before estimatin ...
Cite
Journal ArticleSci Rep · January 8, 2018
Baseball players must be able to see and react in an instant, yet it is hotly debated whether superior performance is associated with superior sensorimotor abilities. In this study, we compare sensorimotor abilities, measured through 8 psychomotor tasks co ...
Full textOpen AccessLink to itemCite
Journal ArticleJournal of Applied Statistics · January 2, 2018
Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty val ...
Full textCite
Journal ArticleJ Sports Sci · January 2018
This study aimed to evaluate the possibility that differences in sensorimotor abilities exist between hitters and pitchers in a large cohort of baseball players of varying levels of experience. Secondary data analysis was performed on 9 sensorimotor tasks ...
Full textOpen AccessLink to itemCite
Journal ArticleKnowledge and Information Systems · January 1, 2018
Linear and logistic regression are popular statistical techniques for analyzing multi-variate data. Typically, analysts do not simply posit a particular form of the regression model, estimate its parameters, and use the results for inference or prediction. ...
Full textCite
Journal ArticleBayesian Analysis · January 1, 2018
We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that ...
Full textCite
Journal ArticleEnergy Policy · December 1, 2017
Lack of information is commonly cited as a market failure resulting in an energy-efficiency gap. Government information policies to fill this gap may enable improvements in energy efficiency and social welfare because of the externalities of energy use. Th ...
Full textOpen AccessCite
Journal ArticleJournal of the American Statistical Association · October 2, 2017
In categorical data, it is typically the case that some combinations of variables are theoretically impossible, such as a 3-year-old child who is married or a man who is pregnant. In practice, however, reported values often include such structural zeros du ...
Full textCite
Journal ArticleJournal of Official Statistics · September 1, 2017
We present an approach to inform decisions about nonresponse follow-up sampling. The basic idea is (i) to create completed samples by imputing nonrespondents' data under various assumptions about the nonresponse mechanisms, (ii) take hypothetical samples o ...
Full textCite
Journal ArticleAmerican journal of epidemiology · July 2017
The National Cancer Institute's Surveillance, Epidemiology, and End Results Program releases research files of cancer registry data. These files include geographic information at the county level, but no finer. Access to finer geography, such as census tra ...
Full textCite
Journal ArticleAmerican Statistician · April 3, 2017
Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the data ...
Full textOpen AccessCite
Journal ArticleBiometrika · March 1, 2017
We introduce a nonresponse mechanism for multivariate missing data in which each study variable and its nonresponse indicator are conditionally independent given the remaining variables and their nonresponse indicators. This is a nonignorable missingness m ...
Full textCite
Journal ArticleBayesian Analysis · January 1, 2017
In some contexts, mixture models can fit certain variables well at the expense of others in ways beyond the analyst's control. For example, when the data include some variables with non-trivial amounts of missing values, the mixture model may fit the margi ...
Full textCite
Journal ArticleAnnals of Applied Statistics · December 1, 2016
In data fusion, analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variant ...
Full textCite
Journal ArticleJournal of the American Statistical Association · October 1, 2016
We present a nonparametric Bayesian joint model for multivariate continuous and categorical variables, with the intention of developing a flexible engine for multiple imputation of missing values. The model fuses Dirichlet process mixtures of multinomial d ...
Full textCite
ConferenceProceedings - IEEE International Conference on Data Mining, ICDM · July 2, 2016
Linear and logistic regression are popular statistical techniques for analyzing multi-variate data. Typically, analysts do not simply posit a particular form of the regression model, estimate its parameters, and use the results for inference orprediction. ...
Full textCite
Journal ArticleBayesian Analysis · June 1, 2016
We present an approach to incorporating informative prior beliefs about marginal probabilities into Bayesian latent class models for categorical data. The basic idea is to append synthetic observations to the original data such that (i) the empirical distr ...
Full textCite
Journal ArticleAnnals of Applied Statistics · March 1, 2016
Many panel studies collect refreshment samples—new, randomly sampled respondents who complete the questionnaire at the same time as a subsequent wave of the panel. With appropriate modeling, these samples can be leveraged to correct inferences for biases c ...
Full textOpen AccessCite
Journal ArticleStatistical Journal of the IAOS · February 27, 2016
In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing ...
Full textCite
Journal ArticleStatistical Journal of the IAOS · February 27, 2016
We present approaches to generating synthetic microdata for multivariate data that take on non-negative integer values, such as magnitude data in economic surveys. The basic idea is to estimate a mixture of Poisson distributions to describe the multivariat ...
Full textCite
Journal ArticleStatistical Journal of the IAOS · February 27, 2016
Several statistical agencies release synthetic microdata, i.e., data with all confidential values replaced with draws from statistical models, in order to protect data subjects' confidentiality. While fully synthetic data are safe from record linkage attac ...
Full textCite
Journal ArticleStatistics in medicine · November 2015
There are many advantages to individual participant data meta-analysis for combining data from multiple studies. These advantages include greater power to detect effects, increased sample heterogeneity, and the ability to perform more sophisticated analyse ...
Full textCite
Journal ArticleSpatial Statistics · November 1, 2015
Many data stewards collect confidential data that include fine geography. When sharing these data with others, data stewards strive to disseminate data that are informative for a wide range of spatial and non-spatial analyses while simultaneously protectin ...
Full textCite
Journal ArticleJournal of Survey Statistics and Methodology · September 1, 2015
Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analyst ...
Full textCite
Journal ArticleJournal of the American Statistical Association · July 3, 2015
Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported dat ...
Full textCite
Journal ArticleJournal of Official Statistics · March 1, 2015
We compare two general strategies for performing statistical disclosure limitation (SDL) for continuous micro data subject to edit rules. In the first, existing SDL methods are applied, and any constraint-violating values they produce are replaced using a ...
Full textCite
Journal ArticlePolitical Analysis · January 1, 2015
Panel studies typically suffer from attrition. Ignoring the attrition can result in biased inferences if the missing data are systematically related to outcomes of interest. Unfortunately, panel data alone cannot inform the extent of bias due to attrition. ...
Full textCite
Journal ArticleMultivariate behavioral research · January 2015
Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related t ...
Full textCite
Journal ArticleJournal of Computational and Graphical Statistics · October 25, 2014
In multivariate categorical data, models based on conditional independence assumptions, such as latent class models, offer efficient estimation of complex dependencies. However, Bayesian versions of latent structure models for categorical data typically do ...
Full textCite
Journal ArticleClinical anatomy (New York, N.Y.) · September 2014
To study anxiety levels in first-year medical students taking gross anatomy. Thirty medical students per year, for 2 years, completed the Beck Anxiety Inventory (BAI) 10 times during a 13-week gross anatomy course. In addition, behavioral observations were ...
Full textCite
Journal ArticleJournal of Business and Economic Statistics · July 3, 2014
Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables ...
Full textCite
Journal ArticleStatistics in medicine · May 2014
Data that include fine geographic information, such as census tract or street block identifiers, can be difficult to release as public use files. Fine geography provides information that ill-intentioned data users can use to identify individuals. We propos ...
Full textCite
Journal ArticleJ Chem Educ · February 11, 2014
We developed the Alcohol Pharmacology Education Partnership (APEP), a set of modules designed to integrate a topic of interest (alcohol) with concepts in chemistry and biology for high school students. Chemistry and biology teachers (n = 156) were recruite ...
Full textLink to itemCite
Journal ArticleSurvey Methodology · January 1, 2014
We propose an approach for multiple imputation of items missing at random in large-scale surveys with exclusively categorical variables that have structural zeros. Our approach is to use mixtures of multinomial distributions as imputation engines, accounti ...
Cite
Journal ArticleStatistical Journal of the IAOS · January 1, 2014
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. Agencies potentially can manage these risks by releasing synthetic micr ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2014
We present an approach for evaluating disclosure risks for fully synthetic categorical data. The basic idea is to compute probability distributions of unknown confidential data values given the synthetic data and assumptions about intruder knowledge. We us ...
Full textCite
Journal ArticleAmerican Statistician · December 17, 2013
In typical implementations of multiple imputation for missing data, analysts create m completed datasets based on approximately independent draws of imputation model parameters.We use theoretical arguments and simulations to show that, provided m is large, ...
Full textCite
Journal ArticleJournal of Educational and Behavioral Statistics · October 2013
In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and ...
Cite
Journal ArticleBayesian analysis · June 2013
Multinomial outcomes with many levels can be challenging to model. Information typically accrues slowly with increasing sample size, yet the parameter space expands rapidly with additional covariates. Shrinking all regression parameters towards zero, as of ...
Full textCite
Journal ArticleStatistics and Computing · May 1, 2013
When multiple data owners possess records on different subjects with the same set of attributes-known as horizontally partitioned data-the data owners can improve analyses by concatenating their databases. However, concatenation of data may be infeasible b ...
Full textCite
Journal ArticleStatistical Science · May 1, 2013
Panel studies typically suffer from attrition, which reduces sample size and can result in biased inferences. It is impossible to know whether or not the attrition causes bias from the observed panel data alone. Refreshment samples-new, randomly sampled re ...
Full textCite
Chapter · January 1, 2013
Those who generate data - for example, official statistics agencies, survey organizations, and principal investigators, henceforth all called agencies - have a long history of providing access to their data to researchers, policy analysts, decision makers, ...
Full textCite
Journal ArticleJournal of Educational and Behavioral Statistics · January 1, 2013
In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and ...
Full textCite
Journal ArticleJournal of the American Statistical Association · December 2012
Statistical agencies and other organizations that disseminate data are obligated to protect data subjects' confidentiality. For example, ill-intentioned individuals might link data subjects to records in other databases by matching on common characteristic ...
Full textCite
Journal ArticleTransactions on Data Privacy · December 1, 2012
We compare the disclosure risk criterion of ε-differential privacy with a criterion based on probabilities that intruders uncover actual values given the released data. To do so, we generate fully synthetic data that satisfy ε-differential privacy at diffe ...
Cite
Journal ArticleJournal of the American Statistical Association · August 2, 2012
Investigators often change how variables are measured during the middle of data-collection, for example, in hopes of obtaining greater accuracy or reducing costs. The resulting data comprise sets of observations measured on two (or more) different scales, ...
Full textCite
Journal ArticleSurvey Methodology · June 1, 2012
To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey p ...
Cite
Journal ArticleStatistics in medicine · May 2012
Within causal inference, principal stratification (PS) is a popular approach for dealing with intermediate variables, that is, variables affected by treatment that also potentially affect the response. However, when there exists unmeasured confounding in t ...
Full textCite
Journal ArticleStatistica Sinica · April 1, 2012
In data fusion, data owners seek to combine datasets with disjoint observations and distinct variables to estimate relationships among the variables. One approach is to concatenate the files, specify models relating the variables not jointly observed, and ...
Full textCite
Journal ArticleBiometrics · March 2012
We describe a Bayesian quantile regression model that uses a confirmatory factor structure for part of the design matrix. This model is appropriate when the covariates are indicators of scientifically determined latent factors, and it is these latent facto ...
Full textCite
Journal ArticlePublic Opinion Quarterly · March 1, 2012
When sharing microdata (i.e., data on individuals) with the public, organizations face competing objectives. On the one hand, they strive to release data files that are useful for a wide range of statistical purposes and easy for secondary data users to an ...
Full textCite
Journal ArticleEthnicity & disease · January 2012
ObjectivesBlack women have increased risk of preterm birth compared to white women, and overall black women are in poorer health than white women. Recent recommendations to reduce preterm birth have focused on preconception health care. We explore ...
Cite
Journal Article · January 1, 2012
Statistical organizations that release data to the public typically are required to protect the confidentiality of survey respondents' identities and attribute values. Removing direct identifiers such as names and addresses generally is not sufficient to e ...
Full textCite
Journal ArticleComputational Statistics and Data Analysis · December 1, 2011
Highlights: Statistical agencies can release simulated data as public use files. Nonparametric regression can be adapted to simulate such datasets. Synthesizers using CART, random forests, support vector machines were compared. CART shown to give highest d ...
Full textCite
Journal ArticleInternational Statistical Review · December 1, 2011
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. One approach with the potential for overcoming these risks is to releas ...
Full textCite
Journal ArticleWiley Interdisciplinary Reviews: Computational Statistics · September 1, 2011
When releasing data to the public, data disseminators typically are required to protect the confidentiality of survey respondents' identities and attribute values. Removing direct identifiers such as names and addresses generally is not sufficient to elimi ...
Full textCite
Journal ArticleJournal of Statistical Theory and Practice · June 1, 2011
Multiple imputation is a common approach for handling missing data. It is also used by government agencies to protect confidential information in public use data files. One reason for the popularity of multiple imputation approaches is ease of use: Analyst ...
Full textCite
Journal ArticleJ Chem Educ · June 1, 2011
Few studies demonstrate the impact of teaching chemistry embedded in a context that has relevance to high school students. We build upon our prior work showing that pharmacology topics (i.e., drugs), which are inherently interesting to high school students ...
Full textLink to itemCite
Journal ArticleStatistics in medicine · March 2011
In many observational studies, analysts estimate causal effects using propensity scores, e.g. by matching, sub-classifying, or inverse probability weighting based on the scores. Estimation of propensity scores is complicated when some values of the covaria ...
Full textCite
Journal ArticleReview of Economic Dynamics · January 1, 2011
We build up from the plant level an "aggregate(d)" Solow residual by estimating every U.S. manufacturing plant's contribution to the change in aggregate final demand between 1976 and 1996. Our framework uses the Petrin and Levinsohn (2010) definition of ag ...
Full textCite
Journal ArticleJournal of the American Statistical Association · December 1, 2010
Many statistical agencies disseminate samples of census microdata, that is, data on individual records, to the public. Before releasing the microdata, agencies typically alter identifying or sensitive values to protect data subjects' confidentiality, for e ...
Full textCite
Journal ArticleAmerican journal of epidemiology · November 2010
Multiple imputation is particularly well suited to deal with missing data in large epidemiologic studies, because typically these studies support a wide range of analyses by many data users. Some of these analyses may involve complex modeling, including in ...
Full textCite
Journal ArticleAmerican Statistician · May 1, 2010
This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed datasets in settings where posterior distributions of the parameters of interest are not approximately Gaussian. We seek to steer practitioners away from a naive ...
Full textOpen AccessCite
Journal ArticleTransactions on Data Privacy · April 1, 2010
Several national statistical agencies are now releasing partially synthetic, public use microdata. These comprise the units in the original database with sensitive or identifying values replaced with values simulated from statistical models. Specifying syn ...
Cite
Journal ArticleStatistica Sinica · January 1, 2010
To protect the confidentiality of survey respondents' identities and sensitive attributes, statistical agencies can release data in which confidential values are replaced with multiple imputations. These are called synthetic data. We propose a two-stage ap ...
Open AccessCite
Journal ArticleJournal of Official Statistics · December 1, 2009
Statistical agencies that disseminate data to the public must protect the confidentiality of respondents' identities and sensitive attributes. To satisfy these requirements, agencies can release the units originally surveyed with some values, such as sensi ...
Cite
Journal ArticleNeurotoxicology · November 2009
Extensive research shows that blacks, those of low socioeconomic status, and other disadvantaged groups continue to exhibit poorer school performance compared with middle and upper-class whites in the United States' educational system. Environmental exposu ...
Full textCite
Journal ArticleInternational Statistical Review · August 1, 2009
In data integration contexts, two statistical agencies seek to merge their separate databases into one file. The agencies also may seek to disseminate data to the public based on the integrated file. These goals may be complicated by the agencies' need to ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series A: Statistics in Society · April 1, 2009
Statistical agencies that own different databases on overlapping subjects can benefit greatly from combining their data. These benefits are passed on to secondary data analysts when the combined data are disseminated to the public. Sometimes combining data ...
Full textCite
Journal ArticleJournal of Official Statistics · March 1, 2009
Reluctance of statistical agencies and other data owners to share possibly confidential or proprietary data with others who own related databases is a serious impediment to conducting mutually beneficial analyses. In this article, we propose a protocol for ...
Cite
Journal ArticleComputational Statistics and Data Analysis · February 15, 2009
To protect confidentiality, statistical agencies typically alter data before releasing them to the public. Ideally, although generally not done, the agency also provides a way for secondary data analysts to assess the quality of inferences obtained with th ...
Full textCite
Scholarly Edition · 2009
contributions to aggregate productivity growth over this period. While reallocation is important for aggregate productivity growth, it contributes little to fluctuations in aggregate productivity growth at business cycle frequencies. Almost all of the vol ...
Cite
Journal ArticleStata Journal · January 1, 2009
We propose improvements to existing degrees of freedom used for significance testing of multivariate hypotheses in small samples when missing data are handled using multiple imputation. The improvements are for 1) tests based on unrestricted fractions of m ...
Full textCite
Journal ArticleBiometrika · December 1, 2008
When some of the records used to estimate the imputation models in multiple imputation are not used or available for analysis, the usual multiple imputation variance estimator has positive bias. We present an alternative approach that enables unbiased esti ...
Full textCite
Journal ArticleStatistics in medicine · August 2008
Propensity score matching is often used in observational studies to create treatment and control groups with similar distributions of observed covariates. Typically, propensity scores are estimated using logistic regressions that assume linearity between t ...
Full textCite
Journal ArticleAnnals of epidemiology · July 2008
PurposeWe explored associations between intendedness of pregnancy with maternal prenatal behaviors, including smoking, use of alcohol, use of illicit drugs, and late initiation of prenatal care.MethodsPregnant black women ages 18 years or ...
Full textCite
Journal ArticleStatistics and Probability Letters · January 1, 2008
Multiple imputation can handle missing data and disclosure limitation simultaneously. First, fill in the missing data to generate m completed datasets, then replace confidential values in each completed dataset with r imputations. I investigate how to sele ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2008
Partially synthetic data comprise the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple draws from statistical models. Because the original records ...
Full textCite
Journal ArticleJournal of the American Statistical Association · December 1, 2007
Multiple imputation was first conceived as a tool that statistical agencies could use to handle nonresponse in large-sample public use surveys. In the last two decades, the multiple-imputation framework has been adapted for other statistical contexts. For ...
Full textCite
Journal ArticleComputational Statistics and Data Analysis · August 15, 2007
When several data owners possess data on different records but the same variables, known as horizontally partitioned data, the owners can improve statistical inferences by sharing their data with each other. Often, however, the owners are unwilling or unab ...
Full textCite
Journal ArticleTechnometrics · August 1, 2007
In industrial and government settings, there is often a need to perform statistical analyses that require data stored in multiple distributed databases. However, the barriers to literally integrating these data can be substantial, even insurmountable. In t ...
Full textCite
Journal ArticleBiometrika · June 1, 2007
When performing multi-component significance tests with multiply-imputed datasets, analysts can use a Wald-like test statistic and a reference F-distribution. The currently employed degrees of freedom in the denominator of this F-distribution are derived a ...
Full textCite
Journal ArticleThe Journal of urology · June 2007
PurposeThe serotonin 5-hydroxytryptamine(1A/7) receptor agonist (R)-8-OH-DPAT (8-hydroxy-2-(di-n-propylamino)tetralin) (Sigma) and the 5-hydroxytryptamine(1A/1B/1D) agonist GR-46611 (3-[3-(2-dimethylaminoethyl)-1H-indol-5-yl]-N-(4-methoxybenzyl)ac ...
Full textCite
Journal ArticleJ Womens Health (Larchmt) · May 2007
OBJECTIVES: Depressive symptoms are common among women, especially those who are of childbearing age or are pregnant. Prior studies have suggested that an increased burden of depressive symptoms is associated with diminished health and functional status, b ...
Full textLink to itemCite
Journal ArticlePsychosom Med · 2007
OBJECTIVE: To focus on the relationship between pregnancy-related anxiety and spontaneous preterm birth. Psychosocial factors have been the subject of inquiries about the etiology of preterm birth; a factor of recent interest is maternal prenatal pregnancy ...
Full textLink to itemCite
Journal Article · December 1, 2006
A continuing need in the contexts of homeland security, national defense, and counterterrorism is for statistical analyses that "integrate" data stored in multiple, distributed databases. There is some belief, for example, that integration of data from fli ...
Full textCite
Journal ArticleAmerican Statistician · August 1, 2006
When releasing data to the public, statistical agencies and survey organizations typically alter data values in order to protect the confidentiality of survey respondents' identities and attribute values. To select among the wide variety of data alteration ...
Full textCite
Journal ArticleStatistics in medicine · July 2006
In causal studies without random assignment of treatment, causal effects can be estimated using matched treated and control samples, where matches are obtained using estimated propensity scores. Propensity score matching can reduce bias in treatment effect ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2006
Statistical agencies alter values of identifiers to protect respondents’ confidentiality. When these identifiers are survey design variables, leaving the original survey weights on the file can be a disclosure risk. Additionally, the original weights may n ...
Full textCite
Journal ArticleJournal of the American Statistical Association · December 1, 2005
When statistical agencies release microdata to the public, malicious users (intruders) may be able to link records in the released data to records in external databases. Releasing data in ways that fail to prevent such identifications may discredit the age ...
Full textCite
Journal ArticleJournal of Statistical Computation and Simulation · November 1, 2005
Owing to the growing concerns over data confidentiality, many national statistical agencies are considering remote access servers to disseminate data to the public. With remote servers, users submit requests for output from statistical models fit using the ...
Full textCite
Journal ArticleJournal of computer-aided molecular design · September 2005
We present a method for performing statistically valid linear regressions on the union of distributed chemical databases that preserves confidentiality of those databases. The method employs secure multi-party computation to share local sufficient statisti ...
Full textCite
Journal ArticleJournal of Computational and Graphical Statistics · June 1, 2005
This article presents several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share i ...
Full textCite
Journal ArticleJournal of Statistical Planning and Inference · May 1, 2005
To limit the risks of disclosures when releasing data to the public, it has been suggested that statistical agencies release multiply imputed, synthetic microdata. For example, the released microdata can be fully synthetic, comprising random samples of uni ...
Full textCite
Journal ArticleStatistical Science · May 1, 2005
Given the public's ever-increasing concerns about data confidentiality, in the near future statistical agencies may be unable or unwilling, or even may not be legally allowed, to release any genuine microdata - data on individual units, such as individuals ...
Full textCite
Journal ArticleJournal of the Royal Statistical Society. Series A: Statistics in Society · February 8, 2005
The paper presents an illustration and empirical study of releasing multiply imputed, fully synthetic public use microdata. Simulations based on data from the US Current Population Survey are used to evaluate the potential validity of inferences based on f ...
Full textCite
Journal ArticleIndustrial and Labor Relations Review · January 1, 2005
Quantitative industrial relations research frequently relies on data collected from large surveys of establishments that use complex sampling designs, such as stratified and unequal probability sampling. The authors analyze two complex surveys of establish ...
Full textCite
Journal ArticleThe Journal of pharmacology and experimental therapeutics · September 2004
The serotonin (5-hydroxytryptamine1A) 5-HT1A receptor agonist 8-OH-DPAT [(R)- (+)-8-hydroxy-2-(di-n-propylamino)tetralin] inhibits bladder activity under nociceptive but not innocuous conditions in cats with an intact spinal cord, suggestive of an effect o ...
Full textCite
Journal ArticleThe Journal of urology · August 2004
PURPOSE: Antagonists of alpha 1-adrenergic receptors (alpha 1ARs) relieve obstructive and irritative symptoms in patients with bladder outlet obstruction. However, to our knowledge mechanisms underlying the relief of irritative symptoms remain unknown. Bec ...
Cite
Journal ArticleKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining · January 1, 2004
Reluctance of data owners to share their possibly confidential or proprietary data with others who own related databases is a serious impediment to conducting a mutually beneficial data mining analysis. We address the case of vertically partitioned data - ...
Full textCite
Journal ArticleStatistics and Computing · October 1, 2003
To protect public-use microdata, one approach is not to allow users access to the microdata. Instead, users submit analyses to a remote computer that reports back basic output from the fitted model, such as coefficients and standard errors. To be most usef ...
Full textCite
Journal ArticleJournal of Survey Statistics and Methodology
Multivariate categorical data nested within households often include reported values that fail edit constraints---for example, a participating household reports a child's age as older than his biological parent's age---as well as missing values. Generally, ...
Full textOpen AccessLink to itemCite
Journal ArticleSurvey Methodology
We present an approach for imputation of missing items in multivariate categor- ical data nested within households. The approach relies on a latent class model that (i) allows for household-level and individual-level variables, (ii) ensures that impossible ...
Open AccessLink to itemCite