Journal ArticleCrit Care Med · January 2019
OBJECTIVES: Previous studies have looked at National Early Warning Score performance in predicting in-hospital deterioration and death, but data are lacking with respect to patient outcomes following implementation of National Early Warning Score. We sough ...
Full textLink to itemCite
Journal ArticleAnnals of Applied Statistics · June 1, 2018
Entity resolution identifies and removes duplicate entities in large, noisy databases and has grown in both usage and new developments as a result of increased data availability. Nevertheless, entity resolution has tradeoffs regarding assumptions of the da ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2018
Record linkage (de-duplication or entity resolution) is the process of merging noisy databases to remove duplicate entities. While record linkage removes duplicate entities from such databases, the downstream task is any inferential, predictive, or post-li ...
Full textCite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2018
Entity resolution seeks to merge databases as to remove duplicate entries where unique identifiers are typically unknown. We review modern blocking approaches for entity resolution, focusing on those based upon locality sensitive hashing (LSH). First, we i ...
Full textCite
Journal ArticleJournal of Machine Learning Research · April 1, 2017
A plethora of networks is being collected in a growing number of fields, including disease transmission, international relations, social interactions, and others. As data streams continue to grow, the complexity associated with these highly multidimensiona ...
Cite
Journal ArticleJournal of the American Statistical Association · October 1, 2016
We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite grap ...
Full textOpen AccessCite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2016
Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman-Yor process mixture models make ...
Cite
Journal Article · December 2, 2015
Most generative models for clustering implicitly assume that the number of
data points in each cluster grows linearly with the total number of data
points. Finite mixture models, Dirichlet process mixture models, and
Pitman--Yor process mixture models make ...
Open AccessLink to itemCite
Journal ArticleAnnals of Applied Statistics · December 1, 2015
Functional neuroimaging measures how the brain responds to complex stimuli. However, sample sizes are modest, noise is substantial, and stimuli are high dimensional. Hence, direct estimates are inherently imprecise and call for regularization. We compare a ...
Full textCite
Journal Article · October 26, 2015
Estimation of death counts and associated standard errors is of great
importance in armed conflict such as the ongoing violence in Syria, as well as
historical conflicts in Guatemala, Per\'u, Colombia, Timor Leste, and Kosovo.
For example, statistical esti ...
Link to itemCite
Journal ArticleBayesian Analysis · January 1, 2015
Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target o ...
Full textCite
Journal ArticleTest · December 1, 2014
We congratulate the authors for a stimulating and valuable manuscript, providing a careful review of the state-of-the-art in cross-sectional and time-series benchmarking procedures for small area estimation. They develop a novel two-stage benchmarking meth ...
Full textCite
Journal Article · October 26, 2014
We develop constrained Bayesian estimation methods for small area problems:
those requiring smoothness with respect to similarity across areas, such as
geographic proximity or clustering by covariates; and benchmarking constraints,
requiring (weighted) mea ...
Link to itemCite
Journal Article · October 17, 2014
Bayesian entity resolution merges together multiple, noisy databases and
returns the minimal collection of unique individuals represented, together with
their true, latent record values. Bayesian methods allow flexible generative
models that share power ac ...
Open AccessLink to itemCite
Journal Article · May 25, 2014
We congratulate the authors for a stimulating and valuable manuscript,
providing a careful review of the state-of the-art in cross-sectional and
time-series benchmarking procedures for small area estimation. They develop a
novel two-stage benchmarking meth ...
Link to itemCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2014
Record linkage seeks to merge databases and to remove duplicates when unique identifiers are not available. Most approaches use blocking techniques to reduce the computational complexity associated with record linkage. We review traditional blocking techni ...
Full textCite
Journal ArticleJournal of Machine Learning Research · January 1, 2014
We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a bipartite graph, in whic ...
Open AccessCite
Journal ArticleTest · November 1, 2013
There has been recent growth in small area estimation due to the need for more precise estimation of small geographic areas, which has led to groups such as the U.S. Census Bureau, Google, and the RAND corporation utilizing small area-estimation procedures ...
Full textCite
Journal Article · April 5, 2013
The PITCHf/x database has allowed the statistical analysis of of Major League
Baseball (MLB) to flourish since its introduction in late 2006. Using PITCHf/x,
pitches have been classified by hand, requiring considerable effort, or using
neural network clust ...
Link to itemCite
Journal ArticleTest · November 1, 2011
It is well-known that small area estimation needs explicit or at least implicit use of models (cf. Rao in Small Area Estimation, Wiley, New York, 2003). These model-based estimates can differ widely from the direct estimates, especially for areas with very ...
Full textCite