Jerome P. Reiter

Journal Article Statistics and Computing · October 1, 2025 Probabilistic record linkage is often used to match records from two files, in particular when the variables common to both files comprise identifiers measured with occasional errors like names and demographic variables. We consider bipartite record linkag ... Full text Cite

Efficient and Scalable Bipartite Matching with Fast Beta Linkage (fabl)

Journal Article Bayesian Analysis · September 1, 2025 Within the field of record linkage, Bayesian methods have the crucial advantage of quantifying uncertainty from imperfect linkages. However, current implementations of Bayesian Fellegi-Sunter models are computationally intensive, making them challenging to ... Full text Cite

Differentially private estimation of weighted average treatment effects for binary outcomes

Journal Article Computational Statistics and Data Analysis · July 1, 2025 In the social and health sciences, researchers often make causal inferences using sensitive variables. These researchers, as well as the data holders themselves, may be ethically and perhaps legally obligated to protect the confidentiality of study partici ... Full text Cite

Imputation of nonignorable missing data in surveys using auxiliary margins via hot deck and sequential imputation

Journal Article Survey Methodology · June 1, 2025 Survey data collection often is plagued by unit and item nonresponse. To reduce reliance on strong assumptions about the missingness mechanisms, statisticians can use information about population marginal distributions known, for example, from censuses or ... Cite

The association between long-term PM2.5 exposure and risk for pancreatic cancer: an application of social informatics.

Journal Article Am J Epidemiol · March 4, 2025 There is a profound need to identify modifiable risk factors to screen and prevent pancreatic cancer. Air pollution, including fine particulate matter (PM2.5), is increasingly recognized as a risk factor for cancer. We conducted a case-control study using ... Full text Link to item Cite

Studying Chinese immigrants' spatial distribution in the Raleigh-Durham area by linking survey and commercial data using romanized names.

Journal Article Journal of the Royal Statistical Society. Series A, (Statistics in Society) · January 2025 Many population surveys do not provide information on respondents' residential addresses, instead offering coarse geographies like zip code or higher aggregations. However, fine resolution geography can be beneficial for characterizing neighbourhoods, espe ... Full text Cite

Differentially Private Verification of Survey-Weighted Estimates

Journal Article Transactions on Data Privacy · January 1, 2025 Several official statistics agencies release synthetic data as public use microdata files. In practice, synthetic data do not admit accurate results for every analysis. Thus, it is beneficial for agencies to provide users with feedback on the quality of th ... Cite

CHANGES TO THE JOURNAL OF PRIVACY AND CONFIDENTIALITY

Journal Article Journal of Privacy and Confidentiality · January 1, 2025 Full text Cite

Evaluating Binary Outcome Classifiers Estimated from Survey Data.

Journal Article Epidemiology (Cambridge, Mass.) · November 2024 Surveys are commonly used to facilitate research in epidemiology, health, and the social and behavioral sciences. Often, these surveys are not simple random samples, and respondents are given weights reflecting their probability of selection into the surve ... Full text Cite

Assessing Statistical Disclosure Risk for Differentially Private, Hierarchical Count Data, with Application to the 2020 US Decennial Census

Journal Article Statistica Sinica · October 1, 2024 Full text Link to item Cite

LAUNCHING THE SOCIETY FOR PRIVACY AND CONFIDENTIALITY RESEARCH TO PUBLISH THE JOURNAL OF PRIVACY AND CONFIDENTIALITY

Journal Article Journal of Privacy and Confidentiality · August 27, 2024 We describe the launching of the Society for Privacy and Confidentiality Research (SPCR). SPCR is the new owner of the Journal of Privacy and Confidentiality, with the goal of ensuring a sustainable future for the Journal, and continuing to publish the mul ... Full text Cite

Bayesian Inference Under Differential Privacy: Prior Selection Considerations with Application to Univariate Gaussian Data and Regression

Preprint · May 22, 2024 Full text Cite

Toward a 21st century national data infrastructure: Managing privacy and confidentiality risks with blended data

Book · April 25, 2024 Protecting privacy and ensuring confidentiality in data is a critical component of modernizing our national data infrastructure. The use of blended data - combining previously collected data sources - presents new considerations for responsible data stewar ... Full text Cite

Regression-Assisted Bayesian Record Linkage for Causal Inference in Observational Studies with Covariates Spread Over Two Files.

Journal Article Journal of statistical planning and inference · March 2024 We consider causal inference for observational studies with data spread over two files. One file includes the treatment, outcome, and some covariates measured on a set of individuals, and the other file includes additional causally-relevant covariates meas ... Full text Cite

Reply to Muralidhar et al., Kenny et al., and Hotz et al.: The benefits of engagement with external research teams.

Journal Article Proceedings of the National Academy of Sciences of the United States of America · March 2024 Full text Cite

Using Auxiliary Marginal Distributions in Imputations for Nonresponse while Accounting for Survey Weights, with Application to Estimating Voter Turnout

Journal Article Journal of Survey Statistics and Methodology · February 1, 2024 In many survey settings, population counts or percentages are available for some of the variables in the survey, for example, from censuses, administrative databases, or other high-quality surveys. We present a model-based approach to utilize such auxiliar ... Full text Cite

Differentially Private Methods for Releasing Results of Stability Analyses

Journal Article American Statistician · January 1, 2024 Data stewards and analysts can promote transparent and trustworthy science and policy-making by facilitating assessments of the sensitivity of published results to alternate analysis choices. For example, researchers may want to assess whether the results ... Full text Cite

Fully Synthetic Data for Complex Surveys.

Journal Article Survey methodology · January 2024 When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our approach adheres to the ... Full text Cite

Prior-itizing Privacy: A Bayesian Approach to Setting the Privacy Budget in Differential Privacy

Conference Advances in Neural Information Processing Systems · January 1, 2024 When releasing outputs from confidential data, agencies need to balance the analytical usefulness of the released data with the obligation to protect data subjects' confidentiality. For releases satisfying differential privacy, this balance is reflected by ... Cite

ER-Evaluation: End-to-End Evaluation of Entity Resolution Systems

Journal Article Journal of Open Source Software · November 11, 2023 Full text Cite

An in-depth examination of requirements for disclosure risk assessment.

Journal Article Proceedings of the National Academy of Sciences of the United States of America · October 2023 The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data prod ... Full text Cite

Prior-itizing Privacy: A Bayesian Approach to Setting the Privacy Budget in Differential Privacy

Preprint · June 19, 2023 Full text Cite

Synthetic Data: A Look Back and A Look Forward

Journal Article Transactions on Data Privacy · January 1, 2023 When initially proposed, synthetic data for disclosure control was generally dismissed as unlikely to be implemented in practice. Thirty years later, synthetic data are becoming a staple of the disclosure limitation toolkit. We now see synthetic public use ... Cite

A LATENT CLASS MODELING APPROACH FOR GENERATING SYNTHETIC DATA AND MAKING POSTERIOR INFERENCES FROM DIFFERENTIALLY PRIVATE COUNTS

Journal Article Journal of Privacy and Confidentiality · July 29, 2022 Several algorithms exist for creating differentially private counts from contingency tables, such as two-way or three-way marginal counts. The resulting noisy counts generally do not correspond to a coherent contingency table, so that some post-processing ... Full text Cite

Bayesian Inference for Estimating Subset Proportions using Differentially Private Counts

Journal Article Journal of Survey Statistics and Methodology · June 1, 2022 Recently, several organizations have considered using differentially private algorithms for disclosure limitation when releasing count data. The typical approach is to add random noise to the counts sampled from, for example, a Laplace distribution or symm ... Full text Cite

Multiple Imputation Inference with Integer-Valued Point Estimates

Journal Article American Statistician · January 1, 2022 We consider settings where an analyst of multiply imputed data desires an integer-valued point estimate and an associated interval estimate, for example, a count of the number of individuals with certain characteristics in a population. Even when the point ... Full text Cite

Bayesian Causal Inference with Bipartite Record Linkage

Journal Article Bayesian Analysis · January 1, 2022 In some scenarios, the observational data needed for causal inferences are spread over two data files. In particular, we consider scenarios where one file includes covariates and the treatment measured on a set of individuals, and a second file includes re ... Full text Cite

Leveraging Auxiliary Information on Marginal Distributions in Nonignorable Models for Item and Unit Nonresponse.

Journal Article Journal of the Royal Statistical Society. Series A, (Statistics in Society) · April 2021 Often, government agencies and survey organizations know the population counts or percentages for some of the variables in a survey. These may be available from auxiliary sources, for example, administrative databases or other high quality surveys. We pres ... Full text Cite

Post-processing differentially private counts to satisfy additive constraints

Journal Article Transactions on Data Privacy · April 1, 2021 To reduce disclosure risks, statistical agencies and other organizations can release noisy counts that satisfy differential privacy. In some contexts, the released counts satisfy additive con-straints; for example, the released value of a total should equa ... Cite

Leveraging random assignment to impute missing covariates in causal studies

Journal Article Journal of Statistical Computation and Simulation · January 1, 2021 Baseline covariates in randomized experiments are often used in the estimation of treatment effects, for example, when estimating treatment effects within covariate-defined subgroups. In practice, however, covariate values may be missing for some data subj ... Full text Cite

Administrative Records for Survey Methodology

Book · January 1, 2021 ADMINISTRATIVE RECORDS FOR SURVEY METHODOLOGY Addresses the international use of administrative records for large-scale surveys, censuses, and other statistical purposes. Administrative Records for Survey Methodology is a comprehensive guide to improving t ... Full text Cite

Assessing Uncertainty When Using Linked Administrative Records

Chapter · January 1, 2021 Linking subjects in a planned medical study to records in administrative databases, such as electronic health records and Medicare claims data, can enable researchers to evaluate long-term outcomes, as well as outcomes not measured in the planned study, wi ... Full text Cite

Preface

Book · January 1, 2021 Full text Cite

Bayesian Mixture Modeling for Multivariate Conditional Distributions

Journal Article Journal of Statistical Theory and Practice · September 1, 2020 We present a Bayesian mixture model for estimating the joint distribution of mixed ordinal, nominal, and continuous data conditional on a set of fixed variables. The modeling strategy is motivated by applied contexts in marketing and the social sciences, i ... Full text Cite

Wilson Confidence Intervals for Binomial Proportions With Multiple Imputation for Missing Data

Journal Article American Statistician · April 2, 2020 We present a Wilson interval for binomial proportions for use with multiple imputation for missing data. Using simulation studies, we show that it can have better repeated sampling properties than the usual confidence interval for binomial proportions base ... Full text Cite

Use of Hospital Referral Regions in Evaluating End-of-Life Care.

Journal Article J Palliat Med · January 2020 Background: Hospital referral regions (HRRs) are often used to characterize inpatient referral patterns, but it is unknown how well these geographic regions are aligned with variation in Medicare-financed hospice care, which is largely provided at home. Ob ... Full text Link to item Cite

Bayesian Modeling for Simultaneous Regression and Record Linkage

Conference Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics · January 1, 2020 Often data analysts use probabilistic record linkage techniques to match records across two data sets. Such matching can be the primary goal, or it can be a necessary step to analyze relationships among the variables in the data sets. We propose a Bayesian ... Full text Cite

Sequentially additive nonignorable missing data modelling using auxiliary marginal information

Journal Article Biometrika · December 1, 2019 We study a class of missingness mechanisms, referred to as sequentially additive nonignorable, for modelling multivariate data with item nonresponse. These mechanisms explicitly allow the probability of nonresponse for each variable to depend on the value ... Full text Cite

Effects of a Government-Academic Partnership: Has the NSF-CENSUS Bureau Research Network Helped Improve the US Statistical System?

Journal Article Journal of Survey Statistics and Methodology · December 1, 2019 The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistic ... Full text Cite

Imputation multiple de valeurs manquantes dans des données des ménages contenant des zéros structurels

Journal Article Survey Methodology · June 1, 2019 Cite

Data fusion for correcting measurement errors

Journal Article Journal of Survey Statistics and Methodology · June 1, 2019 Often in surveys, key items are subject to measurement errors. Given just the data, it can be difficult to determine the extent and distribution of this error process and, hence, to obtain accurate inferences that involve the error-prone variables. In some ... Full text Cite

Differentially Private Significance Tests for Regression Coefficients

Report · April 3, 2019 Many data producers seek to provide users access to confidential data without unduly compromising data subjects’ privacy and confidentiality. One general strategy is to require users to do analyses without seeing the confidential data; for example, analyst ... Full text Cite

Differential privacy and federal data releases

Journal Article Annual Review of Statistics and Its Application · March 7, 2019 Federal statistics agencies strive to release data products that are informative for many purposes, yet also protect the privacy and confidentiality of data subjects' identities and sensitive attributes. This article reviews the role that differential priv ... Full text Cite

Differentially private verification of regression predictions from synthetic data

Journal Article Transactions on Data Privacy · December 1, 2018 One approach for releasing public use files is to make synthetic data, i.e., data simulated from statistical models estimated on the confidential data. Given access only to synthetic data, users cannot tell whether the synthetic data have been constructed ... Cite

Regression Modeling and File Matching Using Possibly Erroneous Matching Variables

Journal Article Journal of Computational and Graphical Statistics · October 2, 2018 Many analyses require linking records from two databases comprising overlapping sets of individuals. In the absence of unique identifiers, the linkage procedure often involves matching on a set of categorical variables, such as demographics, common to both ... Full text Cite

Simultaneous record linkage and causal inference with propensity score subclassification.

Journal Article Statistics in medicine · October 2018 We develop methodology for causal inference in observational studies when using propensity score subclassification on data constructed with probabilistic record linkage techniques. We focus on scenarios where covariates and binary treatment assignments are ... Full text Cite

Sequential identification of nonignorable missing data mechanisms

Journal Article Statistica Sinica · October 1, 2018 With nonignorable missing data, likelihood-based inference should be based on the joint distribution of the study variables and their missingness indicators. These joint models cannot be estimated from the data alone, thus requiring the analyst to impose r ... Full text Cite

Predicting Length of Hospice Stay: An Application of Quantile Regression.

Journal Article J Palliat Med · August 2018 BACKGROUND: Use of the Medicare hospice benefit has been associated with high-quality care at the end of life, and hospice length of use in particular has been used as a proxy for appropriate timing of hospice enrollment. Quantile regression has been under ... Full text Link to item Cite

Imputation in U.S. manufacturing data and its implications for productivity dispersion

Journal Article Review of Economics and Statistics · July 1, 2018 In the U.S. Census Bureau's 2002 and 2007 Censuses of Manufactures, 79% and 73% of observations, respectively, have imputed data for at least one variable used to compute total factor productivity (TFP). The bureau primarily imputes for missing values usin ... Full text Cite

Introduction to the special section on missing data

Journal Article Statistical Science · May 1, 2018 Full text Cite

Improving Bayesian mixture models for multiple imputation of missing data using focused clustering

Journal Article Revstat Statistical Journal · April 1, 2018 We present a joint modeling approach for multiple imputation of missing continuous and categorical variables using Bayesian mixture models. The approach extends the idea of focused clustering, in which one separates variables into two sets before estimatin ... Cite

Sensorimotor abilities predict on-field performance in professional baseball.

Journal Article Sci Rep · January 8, 2018 Baseball players must be able to see and react in an instant, yet it is hotly debated whether superior performance is associated with superior sensorimotor abilities. In this study, we compare sensorimotor abilities, measured through 8 psychomotor tasks co ... Full text Open Access Link to item Cite

Simultaneous edit-imputation and disclosure limitation for business establishment data

Journal Article Journal of Applied Statistics · January 2, 2018 Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty val ... Full text Cite

Visual abilities distinguish pitchers from hitters in professional baseball.

Journal Article J Sports Sci · January 2018 This study aimed to evaluate the possibility that differences in sensorimotor abilities exist between hitters and pitchers in a large cohort of baseball players of varying levels of experience. Secondary data analysis was performed on 9 sensorimotor tasks ... Full text Open Access Link to item Cite

Providing Access to Confidential Research Data Through Synthesis and Verification: An Application to Data on Employees of the U.S. Federal Government

Report · 2018 Link to item Cite

Is my model any good: differentially private regression diagnostics

Journal Article Knowledge and Information Systems · January 1, 2018 Linear and logistic regression are popular statistical techniques for analyzing multi-variate data. Typically, analysts do not simply posit a particular form of the regression model, estimate its parameters, and use the results for inference or prediction. ... Full text Cite

Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data

Journal Article Bayesian Analysis · January 1, 2018 We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that ... Full text Cite

Creating linked datasets for SME energy-assessment evidence-building: Results from the U.S. Industrial Assessment Center Program

Journal Article Energy Policy · December 1, 2017 Lack of information is commonly cited as a market failure resulting in an energy-efficiency gap. Government information policies to fill this gap may enable improvements in energy efficiency and social welfare because of the externalities of energy use. Th ... Full text Open Access Cite

Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data

Journal Article Journal of the American Statistical Association · October 2, 2017 In categorical data, it is typically the case that some combinations of variables are theoretically impossible, such as a 3-year-old child who is married or a man who is pregnant. In practice, however, reported values often include such structural zeros du ... Full text Cite

Discussion

Journal Article Statistica Sinica · October 1, 2017 Full text Cite

GOD, DEVIL AND GURU IN THE LAND OF MULTIPLE IMPUTATION DISCUSSION

Journal Article STATISTICA SINICA · October 1, 2017 Link to item Cite

Stop or continue data collection: A nonignorable missing data approach for continuous variables

Journal Article Journal of Official Statistics · September 1, 2017 We present an approach to inform decisions about nonresponse follow-up sampling. The basic idea is (i) to create completed samples by imputing nonrespondents' data under various assumptions about the nonresponse mechanisms, (ii) take hypothetical samples o ... Full text Cite

Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers.

Journal Article American journal of epidemiology · July 2017 The National Cancer Institute's Surveillance, Epidemiology, and End Results Program releases research files of cancer registry data. These files include geographic information at the county level, but no finer. Access to finer geography, such as census tra ... Full text Cite

A Framework for Sharing Confidential Research Data, Applied to Investigating Differential Pay by Race in the U. S. Government

Journal Article · June 2017 Cite

An Empirical Comparison of Multiple Imputation Methods for Categorical Data

Journal Article American Statistician · April 3, 2017 Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the data ... Full text Open Access Cite

Itemwise conditionally independent nonresponse modelling for incomplete multivariate data

Journal Article Biometrika · March 1, 2017 We introduce a nonresponse mechanism for multivariate missing data in which each study variable and its nonresponse indicator are conditionally independent given the remaining variables and their nonresponse indicators. This is a nonignorable missingness m ... Full text Cite

Bayesian mixture models with focused clustering for mixed ordinal and nominal data

Journal Article Bayesian Analysis · January 1, 2017 In some contexts, mixture models can fit certain variables well at the expense of others in ways beyond the analyst's control. For example, when the data include some variables with non-trivial amounts of missing values, the mixture model may fit the margi ... Full text Cite

Categorical data fusion using auxiliary information

Journal Article Annals of Applied Statistics · December 1, 2016 In data fusion, analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variant ... Full text Cite

Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence

Journal Article Journal of the American Statistical Association · October 1, 2016 We present a nonparametric Bayesian joint model for multivariate continuous and categorical variables, with the intention of developing a flexible engine for multiple imputation of missing values. The model fuses Dirichlet process mixtures of multinomial d ... Full text Cite

Differentially private regression diagnostics

Conference Proceedings IEEE International Conference on Data Mining Icdm · July 2, 2016 Linear and logistic regression are popular statistical techniques for analyzing multi-variate data. Typically, analysts do not simply posit a particular form of the regression model, estimate its parameters, and use the results for inference orprediction. ... Full text Cite

Incorporating marginal prior information in latent class models

Journal Article Bayesian Analysis · June 1, 2016 We present an approach to incorporating informative prior beliefs about marginal probabilities into Bayesian latent class models for categorical data. The basic idea is to append synthetic observations to the original data such that (i) the empirical distr ... Full text Cite

Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples

Journal Article Annals of Applied Statistics · March 1, 2016 Many panel studies collect refreshment samples—new, randomly sampled respondents who complete the questionnaire at the same time as a subsequent wave of the panel. With appropriate modeling, these samples can be leveraged to correct inferences for biases c ... Full text Open Access Cite

Synthetic establishment microdata around the world

Journal Article Statistical Journal of the Iaos · February 27, 2016 In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing ... Full text Cite

Releasing synthetic magnitude microdata constrained to fixed marginal totals

Journal Article Statistical Journal of the Iaos · February 27, 2016 We present approaches to generating synthetic microdata for multivariate data that take on non-negative integer values, such as magnitude data in economic surveys. The basic idea is to estimate a mixture of Poisson distributions to describe the multivariat ... Full text Cite

Assessing disclosure risks for synthetic data with arbitrary intruder knowledge

Journal Article Statistical Journal of the Iaos · February 27, 2016 Several statistical agencies release synthetic microdata, i.e., data with all confidential values replaced with draws from statistical models, in order to protect data subjects' confidentiality. While fully synthetic data are safe from record linkage attac ... Full text Cite

Semi-parametric Selection Models for Potentially Non-ignorable Attrition in Panel Studies with Refreshment Samples

Journal Article Political Analysis · December 2015 Cite

Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis.

Journal Article Statistics in medicine · November 2015 There are many advantages to individual participant data meta-analysis for combining data from multiple studies. These advantages include greater power to detect effects, increased sample heterogeneity, and the ability to perform more sophisticated analyse ... Full text Cite

Bayesian marked point process modeling for generating fully synthetic public use data with point-referenced geography

Journal Article Spatial Statistics · November 1, 2015 Many data stewards collect confidential data that include fine geography. When sharing these data with others, data stewards strive to disseminate data that are informative for a wide range of spatial and non-spatial analyses while simultaneously protectin ... Full text Cite

Accounting for nonignorable unit nonresponse and attrition in panel studies with refreshment samples

Journal Article Journal of Survey Statistics and Methodology · September 1, 2015 Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analyst ... Full text Cite

Simultaneous Edit-Imputation for Continuous Microdata

Journal Article Journal of the American Statistical Association · July 3, 2015 Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported dat ... Full text Cite

Statistical disclosure limitation in the presence of edit rules

Journal Article Journal of Official Statistics · March 1, 2015 We compare two general strategies for performing statistical disclosure limitation (SDL) for continuous micro data subject to edit rules. In the first, existing SDL methods are applied, and any constraint-violating values they produce are replaced using a ... Full text Cite

Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples

Journal Article Political Analysis · January 1, 2015 Panel studies typically suffer from attrition. Ignoring the attrition can result in biased inferences if the missing data are systematically related to outcomes of interest. Unfortunately, panel data alone cannot inform the extent of bias due to attrition. ... Full text Cite

A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets.

Journal Article Multivariate behavioral research · January 2015 Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related t ... Full text Cite

Bayesian Estimation of Discrete Multivariate Latent Structure Models With Structural Zeros

Journal Article Journal of Computational and Graphical Statistics · October 25, 2014 In multivariate categorical data, models based on conditional independence assumptions, such as latent class models, offer efficient estimation of complex dependencies. However, Bayesian versions of latent structure models for categorical data typically do ... Full text Cite

Anxiety in first year medical students taking gross anatomy.

Journal Article Clinical anatomy (New York, N.Y.) · September 2014 To study anxiety levels in first-year medical students taking gross anatomy. Thirty medical students per year, for 2 years, completed the Beck Anxiety Inventory (BAI) 10 times during a 13-week gross anatomy course. In addition, behavioral observations were ... Full text Cite

Using Imputation Techniques to Evaluate Stopping Rules in Adaptive Survey Designs

Journal Article · September 1, 2014 Cite

Multiple Imputation of Missing or Faulty Values Under Linear Constraints

Journal Article Journal of Business and Economic Statistics · July 3, 2014 Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables ... Full text Cite

Imputation of confidential data sets with spatial locations using disease mapping models.

Journal Article Statistics in medicine · May 2014 Data that include fine geographic information, such as census tract or street block identifiers, can be difficult to release as public use files. Fine geography provides information that ill-intentioned data users can use to identify individuals. We propos ... Full text Cite

Alcohol Pharmacology Education Partnership: Using Chemistry and Biology Concepts To Educate High School Students about Alcohol.

Journal Article J Chem Educ · February 11, 2014 We developed the Alcohol Pharmacology Education Partnership (APEP), a set of modules designed to integrate a topic of interest (alcohol) with concepts in chemistry and biology for high school students. Chemistry and biology teachers (n = 156) were recruite ... Full text Link to item Cite

Improving the Synthetic Longitudinal Business Database

Journal Article · February 1, 2014 Cite

Bayesian multiple imputation for large-scale categorical data with structural zeros

Journal Article Survey Methodology · January 1, 2014 We propose an approach for multiple imputation of items missing at random in large-scale surveys with exclusively categorical variables that have structural zeros. Our approach is to use mixtures of multinomial distributions as imputation engines, accounti ... Cite

Disclosure risk evaluation for fully synthetic categorical data

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2014 © Springer International Publishing Switzerland 2014. We present an approach for evaluating disclosure risks for fully synthetic categorical data. The basic idea is to compute probability distributions of unknown confidential data values given the syntheti ... Cite

SynLBD 2.0: Improving the synthetic Longitudinal Business Database

Journal Article Statistical Journal of the Iaos · January 1, 2014 In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. Agencies potentially can manage these risks by releasing synthetic micr ... Full text Cite

Disclosure risk evaluation for fully synthetic categorical data

Conference Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics · January 1, 2014 We present an approach for evaluating disclosure risks for fully synthetic categorical data. The basic idea is to compute probability distributions of unknown confidential data values given the synthetic data and assumptions about intruder knowledge. We us ... Full text Cite

Are independent parameter draws necessary for multiple imputation?

Journal Article American Statistician · December 17, 2013 In typical implementations of multiple imputation for missing data, analysts create m completed datasets based on approximately independent draws of imputation model parameters.We use theoretical arguments and simulations to show that, provided m is large, ... Full text Cite

Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

Journal Article Journal of Educational and Behavioral Statistics · October 2013 In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and ... Cite

Multiple-Shrinkage Multinomial Probit Models with Applications to Simulating Geographies in Public Use Data.

Journal Article Bayesian analysis · June 2013 Multinomial outcomes with many levels can be challenging to model. Information typically accrues slowly with increasing sample size, yet the parameter space expands rapidly with additional covariates. Shrinking all regression parameters towards zero, as of ... Full text Cite

Secure Bayesian model averaging for horizontally partitioned data

Journal Article Statistics and Computing · May 1, 2013 When multiple data owners possess records on different subjects with the same set of attributes-known as horizontally partitioned data-the data owners can improve analyses by concatenating their databases. However, concatenation of data may be infeasible b ... Full text Cite

Handling attrition in longitudinal studies: The case for refreshment samples

Journal Article Statistical Science · May 1, 2013 Panel studies typically suffer from attrition, which reduces sample size and can result in biased inferences. It is impossible to know whether or not the attrition causes bias from the observed panel data alone. Refreshment samples-new, randomly sampled re ... Full text Cite

Using Statistics to Protect Privacy

Chapter · January 1, 2013 Those who generate data - for example, official statistics agencies, survey organizations, and principal investigators, henceforth all called agencies - have a long history of providing access to their data to researchers, policy analysts, decision makers, ... Full text Cite

Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

Journal Article Journal of Educational and Behavioral Statistics · January 1, 2013 In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and ... Full text Cite

Estimating Identification Disclosure Risk Using Mixed Membership Models.

Journal Article Journal of the American Statistical Association · December 2012 Statistical agencies and other organizations that disseminate data are obligated to protect data subjects' confidentiality. For example, ill-intentioned individuals might link data subjects to records in other databases by matching on common characteristic ... Full text Cite

Differential privacy and statistical disclosure risk measures: An investigation with binary synthetic data

Journal Article Transactions on Data Privacy · December 1, 2012 We compare the disclosure risk criterion of ε-differential privacy with a criterion based on probabilities that intruders uncover actual values given the released data. To do so, we generate fully synthetic data that satisfy ε-differential privacy at diffe ... Cite

Discussion

Journal Article International Statistical Review · December 1, 2012 Full text Cite

Nonparametric Bayesian multiple imputation for missing data due to mid-study switching of measurement methods

Journal Article Journal of the American Statistical Association · August 2, 2012 Investigators often change how variables are measured during the middle of data-collection, for example, in hopes of obtaining greater accuracy or reducing costs. The resulting data comprise sets of observations measured on two (or more) different scales, ... Full text Cite

Combining synthetic data with subsampling to create public use microdata files for large scale surveys

Journal Article Survey Methodology · June 1, 2012 To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey p ... Cite

THE ALCOHOL PHARMACOLOGY EDUCATION PARTNERSHIP: EDUCATING HIGH SCHOOL STUDENTS ABOUT ALCOHOL

Conference ALCOHOLISM-CLINICAL AND EXPERIMENTAL RESEARCH · June 1, 2012 Link to item Cite

Sensitivity analysis for unmeasured confounding in principal stratification settings with binary variables.

Journal Article Statistics in medicine · May 2012 Within causal inference, principal stratification (PS) is a popular approach for dealing with intermediate variables, that is, variables affected by treatment that also potentially affect the response. However, when there exists unmeasured confounding in t ... Full text Cite

Bayesian finite population imputation for data fusion

Journal Article Statistica Sinica · April 1, 2012 In data fusion, data owners seek to combine datasets with disjoint observations and distinct variables to estimate relationships among the variables. One approach is to concatenate the files, specify models relating the variables not jointly observed, and ... Full text Cite

Modeling adverse birth outcomes via confirmatory factor quantile regression.

Journal Article Biometrics · March 2012 We describe a Bayesian quantile regression model that uses a confirmatory factor structure for part of the design matrix. This model is appropriate when the covariates are indicators of scientifically determined latent factors, and it is these latent facto ... Full text Cite

Research synthesis: Statistical approaches to protecting confidentiality for microdata and their effects on the quality of statistical inferences

Journal Article Public Opinion Quarterly · March 1, 2012 When sharing microdata (i.e., data on individuals) with the public, organizations face competing objectives. On the one hand, they strive to release data files that are useful for a wide range of statistical purposes and easy for secondary data users to an ... Full text Cite

Plant-Level Productivity and Imputation of Missing Data in U.S. Census Manufacturing Data

Journal Article · February 2012 Cite

Protecting Data Confidentiality in Publicly Released Datasets: Approaches Based on Multiple Imputation

Chapter · 2012 Full text Cite

A comparison of two methods of estimating propensity scores after multiple imputation

Journal Article Statistical Methods in Medical Research · 2012 Cite

Multiple imputation for sharing precise geographies in public use data

Journal Article Annals of Applied Statistics · 2012 Cite

Towards providing automated feedback on the quality of inferences from synthetic datasets

Journal Article Journal of Privacy and Confidentiality · 2012 Cite

Inferentially valid, partially synthetic data: Generating from posterior predictive distributions not necessary

Journal Article Journal of Official Statistics · 2012 Cite

Maternal health prior to pregnancy and preterm birth among urban, low income black women in Baltimore: the Baltimore Preterm Birth Study.

Journal Article Ethnicity & disease · January 2012 ObjectivesBlack women have increased risk of preterm birth compared to white women, and overall black women are in poorer health than white women. Recent recommendations to reduce preterm birth have focused on preconception health care. We explore ... Cite

Nonparametric Bayesian multiple imputation of categorical variables in large scale educational assessment surveys

Journal Article Journal of Educational and Behavioral Statistics · 2012 Cite

Protecting Data Confidentiality in Publicly Released Datasets: Approaches Based on Multiple Imputation

Journal Article · January 1, 2012 Statistical organizations that release data to the public typically are required to protect the confidentiality of survey respondents' identities and attribute values. Removing direct identifiers such as names and addresses generally is not sufficient to e ... Full text Cite

An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets

Journal Article Computational Statistics and Data Analysis · December 1, 2011 Highlights: Statistical agencies can release simulated data as public use files. Nonparametric regression can be adapted to simulate such datasets. Synthesizers using CART, random forests, support vector machines were compared. CART shown to give highest d ... Full text Cite

Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database

Journal Article International Statistical Review · December 1, 2011 In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. One approach with the potential for overcoming these risks is to releas ... Full text Cite

Data confidentiality

Journal Article Wiley Interdisciplinary Reviews Computational Statistics · September 1, 2011 When releasing data to the public, data disseminators typically are required to protect the confidentiality of survey respondents' identities and attribute values. Removing direct identifiers such as names and addresses generally is not sufficient to elimi ... Full text Cite

Sharing confidential data for research purposes: a primer.

Journal Article Epidemiology (Cambridge, Mass.) · September 2011 Full text Cite

A comparison of posterior simulation and inference by combining rules for multiple imputation

Journal Article Journal of Statistical Theory and Practice · June 1, 2011 Multiple imputation is a common approach for handling missing data. It is also used by government agencies to protect confidential information in public use data files. One reason for the popularity of multiple imputation approaches is ease of use: Analyst ... Full text Cite

Teaching High School Chemistry in the Context of Pharmacology Helps Both Teachers and Students Learn.

Journal Article J Chem Educ · June 1, 2011 Few studies demonstrate the impact of teaching chemistry embedded in a context that has relevance to high school students. We build upon our prior work showing that pharmacology topics (i.e., drugs), which are inherently interesting to high school students ... Full text Link to item Cite

Estimating propensity scores with missing covariate data using general location mixture models.

Journal Article Statistics in medicine · March 2011 In many observational studies, analysts estimate causal effects using propensity scores, e.g. by matching, sub-classifying, or inverse probability weighting based on the scores. Estimation of propensity scores is complicated when some values of the covaria ... Full text Cite

Plant-Level Productivity and Imputation of Missing Data in the Census of Manufactures

Journal Article · January 1, 2011 Cite

The Impact of Plant-Level Resource Reallocations and Technical Progress on U.S. Macroeconomic Growth

Journal Article · January 2011 Cite

The impact of plant-level resource reallocations and technical progress on U.S. macroeconomic growth

Journal Article Review of Economic Dynamics · January 1, 2011 We build up from the plant level an "aggregate(d)" Solow residual by estimating every U.S. manufacturing plant's contribution to the change in aggregate final demand between 1976 and 1996. Our framework uses the Petrin and Levinsohn (2010) definition of ag ... Full text Cite

Sharing confidential data for research purposes: A primer [invited commentary]

Journal Article Epidemiology · 2011 Cite

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Journal Article Epidemiology · 2011 Cite

Towards unrestricted public use business microdata: The synthetic Longitudinal Business Database

Journal Article International Statistical Review · 2011 Cite

Sensitivity analysis for unmeasured confounding in principal stratification

Journal Article Statistics in Medicine · 2011 Cite

Sampling with synthesis: A new approach for releasing public use census microdata

Journal Article Journal of the American Statistical Association · December 1, 2010 Many statistical agencies disseminate samples of census microdata, that is, data on individual records, to the public. Before releasing the microdata, agencies typically alter identifying or sensitive values to protect data subjects' confidentiality, for e ... Full text Cite

Multiple imputation for missing data via sequential regression trees.

Journal Article American journal of epidemiology · November 2010 Multiple imputation is particularly well suited to deal with missing data in large epidemiologic studies, because typically these studies support a wide range of analyses by many data users. Some of these analyses may involve complex modeling, including in ... Full text Cite

A note on Bayesian inference after multiple imputation

Journal Article American Statistician · May 1, 2010 This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed datasets in settings where posterior distributions of the parameters of interest are not approximately Gaussian. We seek to steer practitioners away from a naive ... Full text Open Access Cite

Random forests for generating partially synthetic, categorical data

Journal Article Transactions on Data Privacy · April 1, 2010 Several national statistical agencies are now releasing partially synthetic, public use microdata. These comprise the units in the original database with sensitive or identifying values replaced with values simulated from statistical models. Specifying syn ... Cite

Two stage multiple imputation to protect confidentiality

Journal Article Statistica Sinica · 2010 Cite

Sampling with synthesis: A new approach for creating public use census microdata

Journal Article Journal of the American Statistical Association · 2010 Cite

Model selection when multiple imputation is used to protect confidentiality in public use data

Journal Article Journal of Privacy and Confidentiality · 2010 Cite

Tests of multivariate hypotheses when using multiple imputation for missing data and partial synthesis

Journal Article Journal of Official Statistics · 2010 Cite

Multiple imputation for disclosure limitation: Future research challenges

Journal Article Journal of Privacy and Confidentiality · 2010 Cite

Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality

Journal Article Statistica Sinica · January 1, 2010 To protect the confidentiality of survey respondents' identities and sensitive attributes, statistical agencies can release data in which confidential values are replaced with multiple imputations. These are called synthetic data. We propose a two-stage ap ... Open Access Cite

Disclosure risk and data utility for partially synthetic data: An empirical study using the german IAB establishment survey

Journal Article Journal of Official Statistics · December 1, 2009 Statistical agencies that disseminate data to the public must protect the confidentiality of respondents' identities and sensitive attributes. To satisfy these requirements, agencies can release the units originally surveyed with some values, such as sensi ... Cite

Environmental contributors to the achievement gap.

Journal Article Neurotoxicology · November 2009 Extensive research shows that blacks, those of low socioeconomic status, and other disadvantaged groups continue to exhibit poorer school performance compared with middle and upper-class whites in the United States' educational system. Environmental exposu ... Full text Cite

Using multiple imputation to integrate and disseminate confidential microdata

Journal Article International Statistical Review · August 1, 2009 In data integration contexts, two statistical agencies seek to merge their separate databases into one file. The agencies also may seek to disseminate data to the public based on the integrated file. These goals may be complicated by the agencies' need to ... Full text Cite

Multiple imputation for combining confidential data owned by two agencies

Journal Article Journal of the Royal Statistical Society Series A Statistics in Society · April 1, 2009 Statistical agencies that own different databases on overlapping subjects can benefit greatly from combining their data. These benefits are passed on to secondary data analysts when the combined data are disseminated to the public. Sometimes combining data ... Full text Cite

Privacy-preserving analysis of vertically partitioned data using secure matrix products

Journal Article Journal of Official Statistics · March 1, 2009 Reluctance of statistical agencies and other data owners to share possibly confidential or proprietary data with others who own related databases is a serious impediment to conducting mutually beneficial analyses. In this article, we propose a protocol for ... Cite

Verification servers: Enabling analysts to assess the quality of inferences from public use data

Journal Article Computational Statistics and Data Analysis · February 15, 2009 To protect confidentiality, statistical agencies typically alter data before releasing them to the public. Ideally, although generally not done, the agency also provides a way for secondary data analysts to assess the quality of inferences obtained with th ... Full text Cite

Implications of Resource Reallocation and Technical Progress for Macroeconomic Fluctuations: Evidence from Plant-Level U.S. Manufacturing Data

Scholarly Edition · 2009 contributions to aggregate productivity growth over this period. While reallocation is important for aggregate productivity growth, it contributes little to fluctuations in aggregate productivity growth at business cycle frequencies. Almost all of the vol ... Cite

Global measures of data utility for microdata masked for disclosure limitation

Journal Article Journal of Privacy and Confidentiality · 2009 Cite

Using multiple imputation for data integration and dissemination

Journal Article International Statistical Review · 2009 Cite

Improved degrees of freedom for multivariate significance tests obtained from multiply imputed, small-sample data

Journal Article Stata Journal · January 1, 2009 We propose improvements to existing degrees of freedom used for significance testing of multivariate hypotheses in small samples when missing data are handled using multiple imputation. The improvements are for 1) tests based on unrestricted fractions of m ... Full text Cite

Inferences for two stage multiple imputation for nonresponse

Journal Article Journal of Statistical Theory and Practice · 2009 Cite

Multiple imputation when records used for imputation are not used or disseminated for analysis

Journal Article Biometrika · December 1, 2008 When some of the records used to estimate the imputation models in multiple imputation are not used or available for analysis, the usual multiple imputation variance estimator has positive bias. We present an alternative approach that enables unbiased esti ... Full text Cite

Nonresponse bias on dimensions of political activity amongst political elites

Journal Article International Journal of Public Opinion Research · November 1, 2008 Full text Cite

Estimation of propensity scores using generalized additive models.

Journal Article Statistics in medicine · August 2008 Propensity score matching is often used in observational studies to create treatment and control groups with similar distributions of observed covariates. Typically, propensity scores are estimated using logistic regressions that assume linearity between t ... Full text Cite

Unintended pregnancy and prenatal behaviors among urban, black women in Baltimore, Maryland: the Baltimore preterm birth study.

Journal Article Annals of epidemiology · July 2008 PurposeWe explored associations between intendedness of pregnancy with maternal prenatal behaviors, including smoking, use of alcohol, use of illicit drugs, and late initiation of prenatal care.MethodsPregnant black women ages 18 years or ... Full text Cite

Survey Error

Chapter · January 1, 2008 In survey samples, the primary objective usually is to estimate unknown population quantities, such as population means or totals. The estimates invariably do not exactly equal the population values, i.e. there are survey errors. This article reviews the m ... Full text Cite

Selecting the number of imputed datasets when using multiple imputation for missing data and disclosure limitation

Journal Article Statistics and Probability Letters · January 1, 2008 Multiple imputation can handle missing data and disclosure limitation simultaneously. First, fill in the missing data to generate m completed datasets, then replace confidential values in each completed dataset with r imputations. I investigate how to sele ... Full text Cite

Estimation of propensity scores using generalized additive models

Journal Article Statistics in Medicine · 2008 Cite

Statistics in sports: Current and future research trends

Journal Article STAtOR · 2008 Cite

A comparison of respondents and non-respondents on dimensions of political activity

Journal Article International Journal of Public Opinion Research · 2008 Cite

Accounting for intruder uncertainty due to sampling when estimating identification disclosure risks in partially synthetic data

Journal Article Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics · January 1, 2008 Partially synthetic data comprise the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple draws from statistical models. Because the original records ... Full text Cite

The multiple adaptations of multiple imputation

Journal Article Journal of the American Statistical Association · December 1, 2007 Multiple imputation was first conceived as a tool that statistical agencies could use to handle nonresponse in large-sample public use surveys. In the last two decades, the multiple-imputation framework has been adapted for other statistical contexts. For ... Full text Cite

Relevance. Pharmacology in the high-school classroom.

Journal Article Science · September 28, 2007 Full text Link to item Cite

Secure computation with horizontally partitioned data using adaptive regression splines

Journal Article Computational Statistics and Data Analysis · August 15, 2007 When several data owners possess data on different records but the same variables, known as horizontally partitioned data, the owners can improve statistical inferences by sharing their data with each other. Often, however, the owners are unwilling or unab ... Full text Cite

Secure, privacy-preserving analysis of distributed databases

Journal Article Technometrics · August 1, 2007 In industrial and government settings, there is often a need to perform statistical analyses that require data stored in multiple distributed databases. However, the barriers to literally integrating these data can be substantial, even insurmountable. In t ... Full text Cite

Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data

Journal Article Biometrika · June 1, 2007 When performing multi-component significance tests with multiply-imputed datasets, analysts can use a Wald-like test statistic and a reference F-distribution. The currently employed degrees of freedom in the denominator of this F-distribution are derived a ... Full text Cite

Effect of 5-hydroxytryptamine1 serotonin receptor agonists on noxiously stimulated micturition in cats with chronic spinal cord injury.

Journal Article The Journal of urology · June 2007 PurposeThe serotonin 5-hydroxytryptamine(1A/7) receptor agonist (R)-8-OH-DPAT (8-hydroxy-2-(di-n-propylamino)tetralin) (Sigma) and the 5-hydroxytryptamine(1A/1B/1D) agonist GR-46611 (3-[3-(2-dimethylaminoethyl)-1H-indol-5-yl]-N-(4-methoxybenzyl)ac ... Full text Cite

Depressive symptoms and indicators of maternal health status during pregnancy.

Journal Article J Womens Health (Larchmt) · May 2007 OBJECTIVES: Depressive symptoms are common among women, especially those who are of childbearing age or are pregnant. Prior studies have suggested that an increased burden of depressive symptoms is associated with diminished health and functional status, b ... Full text Link to item Cite

Depressive symptoms and maternal health during pregnancy

Journal Article Journal of Women's Health · 2007 Cite

Activation of the external urethral sphincter central pattern generator by a 5-HT1A serotonin receptor agonist in rats with chronic spinal cord injury

Journal Article American Journal of Physiology: Regulatory, Integrative, and Comparative Physiology · 2007 Cite

Pharmacology in the high school classroom

Journal Article Science · 2007 Includes supplemental material. ... Cite

Maternal prenatal pregnancy-related anxiety and spontaneous preterm birth in Baltimore, Maryland.

Journal Article Psychosom Med · 2007 OBJECTIVE: To focus on the relationship between pregnancy-related anxiety and spontaneous preterm birth. Psychosocial factors have been the subject of inquiries about the etiology of preterm birth; a factor of recent interest is maternal prenatal pregnancy ... Full text Link to item Cite

Estimating risks of identification disclosure in partially synthetic data

Journal Article Journal of Privacy and Confidentiality · 2007 Cite

Secure statistical analysis of distributed databases

Journal Article · December 1, 2006 A continuing need in the contexts of homeland security, national defense, and counterterrorism is for statistical analyses that "integrate" data stored in multiple, distributed databases. There is some belief, for example, that integration of data from fli ... Full text Cite

A framework for evaluating the utility of data altered to protect confidentiality

Journal Article American Statistician · August 1, 2006 When releasing data to the public, statistical agencies and survey organizations typically alter data values in order to protect the confidentiality of survey respondents' identities and attribute values. To select among the wide variety of data alteration ... Full text Cite

Interval estimation for treatment effects using propensity score matching.

Journal Article Statistics in medicine · July 2006 In causal studies without random assignment of treatment, causal effects can be estimated using matched treated and control samples, where matches are obtained using estimated propensity scores. Propensity score matching can reduce bias in treatment effect ... Full text Cite

Professional development in pharmacology for high school teachers improves their students' scores in biology & chemistry

Conference FASEB JOURNAL · March 7, 2006 Link to item Cite

The importance of modeling the sampling design in multiple imputation for missing data

Journal Article Survey Methodology · 2006 Cite

Adjusting survey weights when altering identifying design variables via synthetic data

Conference Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics · January 1, 2006 Statistical agencies alter values of identifiers to protect respondents’ confidentiality. When these identifiers are survey design variables, leaving the original survey weights on the file can be a disclosure risk. Additionally, the original weights may n ... Full text Cite

Estimating risks of identification disclosure in microdata

Journal Article Journal of the American Statistical Association · December 1, 2005 When statistical agencies release microdata to the public, malicious users (intruders) may be able to link records in the released data to records in external databases. Releasing data in ways that fail to prevent such identifications may discredit the age ... Full text Cite

Categorical data regression diagnostics for remote access servers

Journal Article Journal of Statistical Computation and Simulation · November 1, 2005 Owing to the growing concerns over data confidentiality, many national statistical agencies are considering remote access servers to disseminate data to the public. With remote servers, users submit requests for output from statistical models fit using the ... Full text Cite

Secure analysis of distributed chemical databases without data integration.

Journal Article Journal of computer-aided molecular design · September 2005 We present a method for performing statistically valid linear regressions on the union of distributed chemical databases that preserves confidentiality of those databases. The method employs secure multi-party computation to share local sufficient statisti ... Full text Cite

A Comparison of Experimental and Observational Data Analyses

Chapter · July 14, 2005 Full text Cite

Secure regression on distributed databases

Journal Article Journal of Computational and Graphical Statistics · June 1, 2005 This article presents several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share i ... Full text Cite

Significance tests for multi-component estimands from multiply imputed, synthetic microdata

Journal Article Journal of Statistical Planning and Inference · May 1, 2005 To limit the risks of disclosures when releasing data to the public, it has been suggested that statistical agencies release multiply imputed, synthetic microdata. For example, the released microdata can be fully synthetic, comprising random samples of uni ... Full text Cite

Data dissemination and disclosure limitation in a world without microdata: A risk-utility framework for remote access analysis servers

Journal Article Statistical Science · May 1, 2005 Given the public's ever-increasing concerns about data confidentiality, in the near future statistical agencies may be unable or unwilling, or even may not be legally allowed, to release any genuine microdata - data on individual units, such as individuals ... Full text Cite

Releasing multiply imputed, synthetic public use microdata: An illustration and empirical study

Journal Article Journal of the Royal Statistical Society Series A Statistics in Society · February 8, 2005 The paper presents an illustration and empirical study of releasing multiply imputed, fully synthetic public use microdata. Simulations based on data from the US Current Population Survey are used to evaluate the potential validity of inferences based on f ... Full text Cite

Analytical modeling in complex surveys of work practices

Journal Article Industrial and Labor Relations Review · January 1, 2005 Quantitative industrial relations research frequently relies on data collected from large surveys of establishments that use complex sampling designs, such as stratified and unequal probability sampling. The authors analyze two complex surveys of establish ... Full text Cite

Inhibition of bladder activity by 5-hydroxytryptamine1 serotonin receptor agonists in cats with chronic spinal cord injury.

Journal Article The Journal of pharmacology and experimental therapeutics · September 2004 The serotonin (5-hydroxytryptamine1A) 5-HT1A receptor agonist 8-OH-DPAT [(R)- (+)-8-hydroxy-2-(di-n-propylamino)tetralin] inhibits bladder activity under nociceptive but not innocuous conditions in cats with an intact spinal cord, suggestive of an effect o ... Full text Cite

Effects of alpha 1-adrenergic receptor subtype selective antagonists on lower urinary tract function in rats with bladder outlet obstruction.

Journal Article The Journal of urology · August 2004 PURPOSE: Antagonists of alpha 1-adrenergic receptors (alpha 1ARs) relieve obstructive and irritative symptoms in patients with bladder outlet obstruction. However, to our knowledge mechanisms underlying the relief of irritative symptoms remain unknown. Bec ... Cite

1727: Effects of Alpha-1 Adrenergic Receptor Subtype· Selective Antagonists on Lower Urinary Tract Function in RATS with Bladder Outlet Obstruction

Conference Journal of Urology · April 2004 Full text Cite

Should teams walk or pitch to Barry Bonds?

Journal Article Baseball Research Journal · 2004 Cite

New approaches to data dissemination: A glimpse into the future (?)

Journal Article Chance · 2004 Cite

Analysis of integrated data without data integration

Journal Article Chance · 2004 Cite

Simultaneous use of multiple imputation for missing data and disclosure limitation

Journal Article Survey Methodology · 2004 Cite

Privacy preserving regression modelling via distributed computation

Journal Article Kdd 2004 Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining · January 1, 2004 Reluctance of data owners to share their possibly confidential or proprietary data with others who own related databases is a serious impediment to conducting a mutually beneficial data mining analysis. We address the case of vertically partitioned data - ... Full text Cite

Synthetic diagnostics for remote access regression servers

Journal Article Statistics and Computing · October 2003 Cite

Model diagnostics for remote access regression servers

Journal Article Statistics and Computing · October 1, 2003 To protect public-use microdata, one approach is not to allow users access to the microdata. Instead, users submit analyses to a remote computer that reports back basic output from the fitted model, such as coefficients and standard errors. To be most usef ... Full text Cite

Multiple imputation for statistical disclosure limitation

Journal Article Journal of Official Statistics · January 2003 Cite

Inference for partially synthetic, public use microdata sets

Journal Article Survey Methodology · 2003 Cite

Generalized linear model diagnostics for remote access servers

Conference ASC 2003: THE IMPACT OF TECHNOLOGY ON THE SURVEY PROCESS · January 1, 2003 Link to item Cite

Satisfying disclosure restrictions with synthetic data sets

Journal Article Journal of Official Statistics · December 2002 Cite

Borrowing strength when explicit data pooling is prohibited

Journal Article Journal of Official Statistics · 2000 Cite

Using statistics to determine causal relationships

Journal Article American Mathematical Monthly · January 1, 2000 Full text Cite

Simultaneous Edit and Imputation for Household Data with Structural Zeros

Journal Article Journal of Survey Statistics and Methodology Multivariate categorical data nested within households often include reported values that fail edit constraints---for example, a participating household reports a child's age as older than his biological parent's age---as well as missing values. Generally, ... Full text Open Access Link to item Cite

Multiple Imputation of Missing Values in Household Data with Structural Zeros

Journal Article Survey Methodology We present an approach for imputation of missing items in multivariate categor- ical data nested within households. The approach relies on a latent class model that (i) allows for household-level and individual-level variables, (ii) ensures that impossible ... Open Access Link to item Cite