Skip to main content

Rebecca Carter Steorts

Associate Professor of Statistical Science
Statistical Science

Selected Publications


Minimal Impact of Implemented Early Warning Score and Best Practice Alert for Patient Deterioration.

Journal Article Crit Care Med · January 2019 OBJECTIVES: Previous studies have looked at National Early Warning Score performance in predicting in-hospital deterioration and death, but data are lacking with respect to patient outcomes following implementation of National Early Warning Score. We sough ... Full text Link to item Cite

Unique entity estimation with application to the syrian conflict

Journal Article Annals of Applied Statistics · June 1, 2018 Entity resolution identifies and removes duplicate entities in large, noisy databases and has grown in both usage and new developments as a result of increased data availability. Nevertheless, entity resolution has tradeoffs regarding assumptions of the da ... Full text Cite

Generalized bayesian record linkage and regression with exact error propagation

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2018 Record linkage (de-duplication or entity resolution) is the process of merging noisy databases to remove duplicate entities. While record linkage removes duplicate entities from such databases, the downstream task is any inferential, predictive, or post-li ... Full text Cite

Probabilistic blocking with an application to the syrian conflict

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2018 Entity resolution seeks to merge databases as to remove duplicate entries where unique identifiers are typically unknown. We review modern blocking approaches for entity resolution, focusing on those based upon locality sensitive hashing (LSH). First, we i ... Full text Cite

Bayesian learning of dynamic multilayer networks

Journal Article Journal of Machine Learning Research · April 1, 2017 A plethora of networks is being collected in a growing number of fields, including disease transmission, international relations, social interactions, and others. As data streams continue to grow, the complexity associated with these highly multidimensiona ... Cite

A Bayesian Approach to Graphical Record Linkage and Deduplication

Journal Article Journal of the American Statistical Association · October 1, 2016 We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite grap ... Full text Open Access Cite

Flexible models for microclustering with application to entity resolution

Conference Advances in Neural Information Processing Systems · January 1, 2016 Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman-Yor process mixture models make ... Cite

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Journal Article · December 2, 2015 Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make ... Open Access Link to item Cite

Regularized brain reading with shrinkage and smoothing

Journal Article Annals of Applied Statistics · December 1, 2015 Functional neuroimaging measures how the brain responds to complex stimuli. However, sample sizes are modest, noise is substantial, and stimuli are high dimensional. Hence, direct estimates are inherently imprecise and call for regularization. We compare a ... Full text Cite

Blocking Methods Applied to Casualty Records from the Syrian Conflict

Journal Article · October 26, 2015 Estimation of death counts and associated standard errors is of great importance in armed conflict such as the ongoing violence in Syria, as well as historical conflicts in Guatemala, Per\'u, Colombia, Timor Leste, and Kosovo. For example, statistical esti ... Link to item Cite

Entity resolution with empirically motivated priors

Journal Article Bayesian Analysis · January 1, 2015 Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target o ... Full text Cite

Comments on: “Single and two-stage cross-sectional and time series benchmarking procedures for small area estimation”

Journal Article Test · December 1, 2014 We congratulate the authors for a stimulating and valuable manuscript, providing a careful review of the state-of-the-art in cross-sectional and time-series benchmarking procedures for small area estimation. They develop a novel two-stage benchmarking meth ... Full text Cite

Smoothing, Clustering, and Benchmarking for Small Area Estimation

Journal Article · October 26, 2014 We develop constrained Bayesian estimation methods for small area problems: those requiring smoothness with respect to similarity across areas, such as geographic proximity or clustering by covariates; and benchmarking constraints, requiring (weighted) mea ... Link to item Cite

Variational Bayes for Merging Noisy Databases

Journal Article · October 17, 2014 Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values. Bayesian methods allow flexible generative models that share power ac ... Open Access Link to item Cite

Discussion of "Single and Two-Stage Cross-Sectional and Time Series Benchmarking Procedures for SAE"

Journal Article · May 25, 2014 We congratulate the authors for a stimulating and valuable manuscript, providing a careful review of the state-of the-art in cross-sectional and time-series benchmarking procedures for small area estimation. They develop a novel two-stage benchmarking meth ... Link to item Cite

A comparison of blocking methods for record linkage

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2014 Record linkage seeks to merge databases and to remove duplicates when unique identifiers are not available. Most approaches use blocking techniques to reduce the computational complexity associated with record linkage. We review traditional blocking techni ... Full text Cite

SMERED: A Bayesian approach to graphical record linkage and de-duplication

Journal Article Journal of Machine Learning Research · January 1, 2014 We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a bipartite graph, in whic ... Open Access Cite

Two-stage benchmarking as applied to small area estimation

Journal Article Test · November 1, 2013 There has been recent growth in small area estimation due to the need for more precise estimation of small geographic areas, which has led to groups such as the U.S. Census Bureau, Google, and the RAND corporation utilizing small area-estimation procedures ... Full text Cite

Trouble With The Curve: Improving MLB Pitch Classification

Journal Article · April 5, 2013 The PITCHf/x database has allowed the statistical analysis of of Major League Baseball (MLB) to flourish since its introduction in late 2006. Using PITCHf/x, pitches have been classified by hand, requiring considerable effort, or using neural network clust ... Link to item Cite

Minimal Impact of Implemented Early Warning Score and Best Practice Alert for Patient Deterioration.

Journal Article Crit Care Med · January 2019 OBJECTIVES: Previous studies have looked at National Early Warning Score performance in predicting in-hospital deterioration and death, but data are lacking with respect to patient outcomes following implementation of National Early Warning Score. We sough ... Full text Link to item Cite

Unique entity estimation with application to the syrian conflict

Journal Article Annals of Applied Statistics · June 1, 2018 Entity resolution identifies and removes duplicate entities in large, noisy databases and has grown in both usage and new developments as a result of increased data availability. Nevertheless, entity resolution has tradeoffs regarding assumptions of the da ... Full text Cite

Generalized bayesian record linkage and regression with exact error propagation

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2018 Record linkage (de-duplication or entity resolution) is the process of merging noisy databases to remove duplicate entities. While record linkage removes duplicate entities from such databases, the downstream task is any inferential, predictive, or post-li ... Full text Cite

Probabilistic blocking with an application to the syrian conflict

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2018 Entity resolution seeks to merge databases as to remove duplicate entries where unique identifiers are typically unknown. We review modern blocking approaches for entity resolution, focusing on those based upon locality sensitive hashing (LSH). First, we i ... Full text Cite

Bayesian learning of dynamic multilayer networks

Journal Article Journal of Machine Learning Research · April 1, 2017 A plethora of networks is being collected in a growing number of fields, including disease transmission, international relations, social interactions, and others. As data streams continue to grow, the complexity associated with these highly multidimensiona ... Cite

A Bayesian Approach to Graphical Record Linkage and Deduplication

Journal Article Journal of the American Statistical Association · October 1, 2016 We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite grap ... Full text Open Access Cite

Flexible models for microclustering with application to entity resolution

Conference Advances in Neural Information Processing Systems · January 1, 2016 Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman-Yor process mixture models make ... Cite

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Journal Article · December 2, 2015 Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make ... Open Access Link to item Cite

Regularized brain reading with shrinkage and smoothing

Journal Article Annals of Applied Statistics · December 1, 2015 Functional neuroimaging measures how the brain responds to complex stimuli. However, sample sizes are modest, noise is substantial, and stimuli are high dimensional. Hence, direct estimates are inherently imprecise and call for regularization. We compare a ... Full text Cite

Blocking Methods Applied to Casualty Records from the Syrian Conflict

Journal Article · October 26, 2015 Estimation of death counts and associated standard errors is of great importance in armed conflict such as the ongoing violence in Syria, as well as historical conflicts in Guatemala, Per\'u, Colombia, Timor Leste, and Kosovo. For example, statistical esti ... Link to item Cite

Entity resolution with empirically motivated priors

Journal Article Bayesian Analysis · January 1, 2015 Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target o ... Full text Cite

Comments on: “Single and two-stage cross-sectional and time series benchmarking procedures for small area estimation”

Journal Article Test · December 1, 2014 We congratulate the authors for a stimulating and valuable manuscript, providing a careful review of the state-of-the-art in cross-sectional and time-series benchmarking procedures for small area estimation. They develop a novel two-stage benchmarking meth ... Full text Cite

Smoothing, Clustering, and Benchmarking for Small Area Estimation

Journal Article · October 26, 2014 We develop constrained Bayesian estimation methods for small area problems: those requiring smoothness with respect to similarity across areas, such as geographic proximity or clustering by covariates; and benchmarking constraints, requiring (weighted) mea ... Link to item Cite

Variational Bayes for Merging Noisy Databases

Journal Article · October 17, 2014 Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values. Bayesian methods allow flexible generative models that share power ac ... Open Access Link to item Cite

Discussion of "Single and Two-Stage Cross-Sectional and Time Series Benchmarking Procedures for SAE"

Journal Article · May 25, 2014 We congratulate the authors for a stimulating and valuable manuscript, providing a careful review of the state-of the-art in cross-sectional and time-series benchmarking procedures for small area estimation. They develop a novel two-stage benchmarking meth ... Link to item Cite

A comparison of blocking methods for record linkage

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2014 Record linkage seeks to merge databases and to remove duplicates when unique identifiers are not available. Most approaches use blocking techniques to reduce the computational complexity associated with record linkage. We review traditional blocking techni ... Full text Cite

SMERED: A Bayesian approach to graphical record linkage and de-duplication

Journal Article Journal of Machine Learning Research · January 1, 2014 We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a bipartite graph, in whic ... Open Access Cite

Two-stage benchmarking as applied to small area estimation

Journal Article Test · November 1, 2013 There has been recent growth in small area estimation due to the need for more precise estimation of small geographic areas, which has led to groups such as the U.S. Census Bureau, Google, and the RAND corporation utilizing small area-estimation procedures ... Full text Cite

Trouble With The Curve: Improving MLB Pitch Classification

Journal Article · April 5, 2013 The PITCHf/x database has allowed the statistical analysis of of Major League Baseball (MLB) to flourish since its introduction in late 2006. Using PITCHf/x, pitches have been classified by hand, requiring considerable effort, or using neural network clust ... Link to item Cite

Bayesian benchmarking with applications to small area estimation

Journal Article Test · November 1, 2011 It is well-known that small area estimation needs explicit or at least implicit use of models (cf. Rao in Small Area Estimation, Wiley, New York, 2003). These model-based estimates can differ widely from the direct estimates, especially for areas with very ... Full text Cite