Skip to main content

Ashwinkumar Venkatanaga Machanavajjhala

Adjunct Associate Professor of Computer Science
Computer Science

Selected Publications


Privately Answering Queries on Skewed Data via Per-Record Differential Privacy

Conference Proceedings of the VLDB Endowment · January 1, 2024 We consider the problem of the private release of statistics (like pay roll) where it is critical to preserve the contribution made by a small number of outlying large entities. We propose a privacy formalism, per-record zero concentrated differential priv ... Full text Cite

HDMM: OPTIMIZING ERROR OF HIGH-DIMENSIONAL STATISTICAL QUERIES UNDER DIFFERENTIAL PRIVACY

Journal Article Journal of Privacy and Confidentiality · August 31, 2023 In this work we describe the High-Dimensional Matrix Mechanism (HDMM), a differentially private algorithm for answering a workload of predicate counting queries. HDMM represents query workloads using a compact implicit matrix representation and exploits th ... Full text Cite

R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys

Journal Article SIGMOD Record · June 8, 2023 Answering SPJA queries under differential privacy (DP), including graph pattern counting under node-DP as an important special case, has received considerable attention in recent years. The dual challenge of foreign-key constraints and self-joins is partic ... Full text Cite

PreFair: Privately Generating Justifiably Fair Synthetic Data

Conference Proceedings of the VLDB Endowment · January 1, 2023 When a database is protected by Differential Privacy (DP), its us-ability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the syntheti ... Full text Cite

Longshot: Indexing Growing Databases using MPC and Differential Privacy

Journal Article Proceedings of the VLDB Endowment · January 1, 2023 In this work, we propose Longshot, a novel design for secure outsourced database systems that supports ad-hoc queries through the use of secure multi-party computation and differential privacy. By combining these two techniques, we build and maintain data ... Full text Cite

Explaining Differentially Private Query Results With DPXPlain

Conference Proceedings of the VLDB Endowment · January 1, 2023 Employing Differential Privacy (DP), the state-of-the-art privacy standard, to answer aggregate database queries poses new challenges for users to understand the trends and anomalies observed in the query results: Is the unexpected answer due to the data i ... Full text Cite

Private Proof-of-Stake Blockchains using Differentially-Private Stake Distortion

Conference 32nd USENIX Security Symposium, USENIX Security 2023 · January 1, 2023 Safety, liveness, and privacy are three critical properties for any private proof-of-stake (PoS) blockchain. However, prior work (SP’21) has shown that to obtain safety and liveness, a PoS blockchain must in theory forgo privacy. Specifically, to ensure sa ... Cite

DP-PQD: Privately Detecting Per-Query Gaps In Synthetic Data Generated By Black-Box Mechanisms

Journal Article Proceedings of the VLDB Endowment · January 1, 2023 Synthetic data generation methods, and in particular, private synthetic data generation methods, are gaining popularity as a means to make copies of sensitive databases that can be shared widely for research and data analysis. Some of the fundamental opera ... Full text Cite

Transitioning from testbeds to ships: an experience study in deploying the TIPPERS Internet of Things platform to the US Navy

Journal Article Journal of Defense Modeling and Simulation · July 1, 2022 This paper describes the collaborative effort between privacy and security researchers at nine different institutions along with researchers at the Naval Information Warfare Center to deploy, test, and demonstrate privacy-preserving technologies in creatin ... Full text Cite

IncShrink: Architecting Efficient Outsourced Databases using Incremental MPC and Differential Privacy

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · June 10, 2022 In this paper, we consider secure outsourced growing databases (SOGDB) that support view-based query answering. These databases allow untrusted servers to privately maintain a materialized view. This allows servers to use only the materialized view for que ... Full text Cite

R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · June 10, 2022 Answering SPJA queries under differential privacy (DP), including graph pattern counting under node-DP as an important special case, has received considerable attention in recent years. The dual challenge of foreign-key constraints and self-joins is partic ... Full text Cite

DPXPlain: Privately Explaining Aggregate Query Answers

Conference Proceedings of the VLDB Endowment · January 1, 2022 Differential privacy (DP) is the state-of-the-art and rigorous notion of privacy for answering aggregate database queries while preserving the privacy of sensitive information in the data. In today’s era of data analysis, however, it poses new challenges f ... Full text Cite

Multi-Analyst Differential Privacy for Online Query Answering

Journal Article Proceedings of the VLDB Endowment · January 1, 2022 Most differentially private mechanisms are designed for the use of a single analyst. In reality, however, there are often multiple stake-holders with different and possibly conflicting priorities that must share the same privacy loss budget. This motivates ... Full text Cite

Equity and Privacy: More Than Just a Tradeoff

Journal Article IEEE Security and Privacy · November 1, 2021 Full text Cite

Practical Security and Privacy for Database Systems

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2021 Computing technology has enabled massive digital traces of our personal lives to be collected and stored. These datasets play an important role in numerous real-life applications and research analysis, such as contact tracing for COVID 19, but they contain ... Full text Cite

DP-Sync: Hiding Update Patterns in Secure Outsourced Databases with Differential Privacy

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2021 In this paper, we consider privacy-preserving update strategies for secure outsourced growing databases. Such databases allow appendonly data updates on the outsourced data structure while analysis is ongoing. Despite a plethora of solutions to securely ou ... Full text Cite

Synthesizing Linked Data under Cardinality and Integrity Constraints

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2021 The generation of synthetic data is useful in multiple aspects, from testing applications to benchmarking to privacy preservation. Generating thelinks between relations, subject tocardinality constraints (CCs) andintegrity constraints (ICs) is an important ... Full text Cite

Budget sharing for multi-analyst differential privacy

Conference Proceedings of the VLDB Endowment · January 1, 2021 Large organizations that collect data about populations (like the US Census Bureau) release summary statistics that are used by multiple stakeholders for resource allocation and policy making problems. These organizations are also legally required to prote ... Full text Cite

Poirot: Private contact summary aggregation: Poster abstract

Conference SenSys 2020 - Proceedings of the 2020 18th ACM Conference on Embedded Networked Sensor Systems · November 16, 2020 Physical distancing between individuals is key to preventing the spread of a disease such as COVID-19. On the one hand, having access to information about physical interactions is critical for decision makers; on the other, this information is sensitive an ... Full text Cite

Computing Local Sensitivities of Counting Queries with Joins

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · June 14, 2020 Local sensitivity of a query Q given a database instance D, i.e. how much the output Q(D) changes when a tuple is added to D or deleted from D, has many applications including query analysis, outlier detection, and differential privacy. However, it is NP-h ... Full text Cite

Crypte: Crypto-Assisted Differential Privacy on Untrusted Servers

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · June 14, 2020 Differential privacy (DP) is currently the de-facto standard for achieving privacy in data analysis, which is typically implemented either in the "central" or "local" model. The local model has been more popular for commercial deployments as it does not re ... Full text Cite

One-sided differential privacy

Conference Proceedings - International Conference on Data Engineering · April 1, 2020 We study the problem of privacy-preserving data sharing, wherein only a subset of the records in a database is sensitive, possibly based on predefined privacy policies. Existing solutions, viz, differential privacy (DP), are over-pessimistic as they treat ... Full text Cite

ϵKtelo: A framework for defining differentially private computations

Journal Article ACM Transactions on Database Systems · February 1, 2020 The adoption of differential privacy is growing, but the complexity of designing private, efficient, and accurate algorithms is still high. We propose a novel programming framework and system, ϵktelo, for implementing both existing and new privacy algorith ... Full text Cite

Fair decision making using privacy-protected data

Conference FAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency · January 27, 2020 Data collected about individuals is regularly used to make decisions that impact those same individuals. We consider settings where sensitive personal data is used to decide who will receive resources or benefits. While it is well known that there is a tra ... Full text Cite

RELEASING EARNINGS DISTRIBUTIONS USING DIFFERENTIAL PRIVACY: DISCLOSURE AVOIDANCE SYSTEM FOR POST-SECONDARY EMPLOYMENT OUTCOMES (PSEO)

Journal Article Journal of Privacy and Confidentiality · October 23, 2019 The U.S. Census Bureau recently released data on earnings percentiles of grad-uates from post-secondary institutions. This paper describes and evaluates the disclosure avoidance system developed for these statistics. We propose a differentially private alg ... Full text Cite

APEX: Accuracy-aware differentially private data exploration

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · June 25, 2019 Organizations are increasingly interested in allowing external data scientists to explore their sensitive datasets. Due to the popularity of differential privacy, data owners want the data exploration to ensure provable privacy guarantees. However, current ... Full text Cite

Permissions plugins as android apps

Conference MobiSys 2019 - Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services · June 12, 2019 The permissions framework for Android is frustratingly inflexible. Once granted a permission, Android will always allow an app to access the resource until the user manually revokes the app’s permission. Prior work has proposed extensible plugin frameworks ... Full text Cite

Differentially Private Significance Tests for Regression Coefficients

Report · April 3, 2019 Many data producers seek to provide users access to confidential data without unduly compromising data subjects’ privacy and confidentiality. One general strategy is to require users to do analyses without seeing the confidential data; for example, analyst ... Full text Cite

Privstream: Differentially private event detection on data streams

Conference CODASPY 2019 - Proceedings of the 9th ACM Conference on Data and Application Security and Privacy · March 13, 2019 Event monitoring and detection in real-time systems is crucial. Protecting users’ data while reporting an event in almost real-time will increase the level of this challenge. In this work, we adopt the strong notion of differential privacy to private strea ... Full text Cite

?ktelo: A framework for defining differentially-private computations

Conference SIGMOD Record · March 1, 2019 The adoption of differential privacy is growing but the complexity of designing private, efficient and accurate algorithms is still high. We propose a novel programming framework and system, ?ktelo, for implementing both existing and new privacy algorithms ... Full text Cite

Architecting a differentially private SQL engine

Conference CIDR 2019 - 9th Biennial Conference on Innovative Data Systems Research · January 1, 2019 © 2019 Conference on Innovative Data Systems Research (CIDR). All rights reserved. In recent years, differential privacy (DP) has emerged as the state-of-the-art for privately analyzing sensitive data. Despite its wide acceptance in the academic community ... Cite

Privacy Changes Everything

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2019 We are storing and querying datasets with the private information of individuals at an unprecedented scale in settings ranging from IoT devices in smart homes to mining enormous collections of click trails for targeted advertising. Here, the privacy of the ... Full text Cite

Architecting a differentially private SQL engine

Conference CIDR 2019 - 9th Biennial Conference on Innovative Data Systems Research · January 1, 2019 In recent years, differential privacy (DP) has emerged as the state-of-the-art for privately analyzing sensitive data. Despite its wide acceptance in the academic community and much work on differentially private algorithm design, there is surprisingly lit ... Cite

Capacity bounded differential privacy

Conference Advances in Neural Information Processing Systems · January 1, 2019 Differential privacy has emerged as the gold standard for measuring the risk posed by an algorithm's output to the privacy of a single individual in a dataset. It is defined as the worst-case distance between the output distributions of an algorithm that i ... Cite

Analyzing your location data with provable privacy guarantees

Chapter · October 26, 2018 The ubiquity of smartphones and wearable devices coupled with the ability to sense locations through these devices has brought location privacy into the forefront of public debate. Location information is actively collected to help improve ad targeting, pr ... Full text Cite

IoT-Detective: Analyzing IoT data under differential privacy

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · May 27, 2018 The success of emerging IoT applications depends on integrating privacy protections into the IoT infrastructures to guard against privacy risks posed by sensor-based continuous monitoring of individuals and their activities. This demonstration adapts a rec ... Full text Cite

Is my model any good: differentially private regression diagnostics

Journal Article Knowledge and Information Systems · January 1, 2018 Linear and logistic regression are popular statistical techniques for analyzing multi-variate data. Typically, analysts do not simply posit a particular form of the regression model, estimate its parameters, and use the results for inference or prediction. ... Full text Cite

Differentially private hierarchical countofcounts histograms

Conference Proceedings of the VLDB Endowment · January 1, 2018 We consider the problem of privately releasing a class of queries that we call hierarchical count-of-counts histograms. Count-of-counts histograms partition the rows of an input table into groups (e.g., group of people in the same house- hold), and for eve ... Full text Cite

Shrinkwrap: Efficient SQL query processing in differentially private data federations

Conference Proceedings of the VLDB Endowment · January 1, 2018 A private data federation is a set of autonomous databases that share a unified query interface offering in-situ evaluation of SQL queries over the union of the sensitive data of its members. Owing to privacy concerns, these systems do not have a trusted d ... Full text Cite

Optimizing error of highdimensional statistical queries under differential privacy

Conference Proceedings of the VLDB Endowment · January 1, 2018 Differentially private algorithms for answering sets of predicate counting queries on a sensitive database have many applications. Organizations that collect individual-level data, such as statistical agencies and medical institutions, use them to safely r ... Full text Cite

PSynDB: Accurate and accessible private data generation

Conference Proceedings of the VLDB Endowment · January 1, 2018 Across many application domains, trusted parties who collect sensitive information need mechanisms to safely disseminate data. A favored approach is to generate synthetic data: a dataset similar to the original, hopefully retaining its statistical features ... Full text Cite

PrivateSQL: A differentially private SQL query engine

Conference Proceedings of the VLDB Endowment · January 1, 2018 Differential privacy is considered a de facto standard for private data analysis. However, the definition and much of the supporting literature applies to flat tables. While there exist variants of the definition and specialized algorithms for specific typ ... Full text Cite

PeGaSus: Data-Adaptive differentially private stream processing

Conference Proceedings of the ACM Conference on Computer and Communications Security · October 30, 2017 Individuals are continually observed by an ever-increasing number of sensors that make up the Internet of Things. The resulting streams of data, which are analyzed in real time, can reveal sensitive personal information about individuals. Hence, there is a ... Full text Cite

Composing Differential Privacy and Secure Computation: A case study on scaling private record linkage

Conference Proceedings of the ACM Conference on Computer and Communications Security · October 30, 2017 Private record linkage (PRL) is the problem of identifying pairs of records that are similar as per an input matching rule from databases held by two parties that do not trust one another. We identify three key desiderata that a PRL solution must ensure: ( ... Full text Cite

EPrivateeye: To the edge and beyond!

Conference 2017 2nd ACM/IEEE Symposium on Edge Computing, SEC 2017 · October 12, 2017 Edge computing offers resource-constrained devices lowlatency access to high-performance computing infrastructure. In this paper, we present ePrivateEye, an implementation of PrivateEye that offloads computationally expensive computervision processing to a ... Full text Cite

Protecting Visual Secrets Using Adversarial Nets

Conference IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops · August 22, 2017 Protecting visual secrets is an important problem due to the prevalence of cameras that continuously monitor our surroundings. Any viable solution to this problem should also minimize the impact on the utility of applications that use images. In this work, ... Full text Cite

Pythia: Data dependent differentially private algorithm selection

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · May 9, 2017 Differential privacy has emerged as a preferred standard for ensuring privacy in analysis tasks on sensitive datasets. Recent algorithms have allowed for significantly lower error by adapting to properties of the input data. These so-called data-dependent ... Full text Cite

Differential privacy in the wild: A tutorial on current practices & open challenges

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · May 9, 2017 Differential privacy has emerged as an important standard for privacy preserving computation over databases containing sensitive information about individuals. Research on differential privacy spanning a number of research areas, including theory, security ... Full text Cite

Utility cost of formal privacy for releasing national employer-employee statistics

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · May 9, 2017 National statistical agencies around the world publish tabular summaries based on combined employer-employee (ER-EE) data. The privacy of both individuals and business establishments that feature in these data are protected by law in most countries. These ... Full text Cite

DIAS: Differentially private interactive algorithm selection using Pythia

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · May 9, 2017 Differential privacy has emerged as the dominant privacy standard for data analysis. Its wide acceptance has led to significant development of algorithms that meet this rigorous standard. For some tasks, such as the task of answering low dimensional counti ... Full text Cite

Directed edge recommender system

Conference WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining · February 2, 2017 Recommender systems have become ubiquitous in online ap- plications where companies personalize the user experience based on explicit or inferred user preferences. Most mod- ern recommender systems concentrate on finding relevant items for each individual ... Full text Cite

Preface

Conference Journal of Ambient Intelligence and Smart Environments · January 1, 2017 Full text Cite

Ayumu: Efficient lifelogging with focused tasks

Conference MobiCASE 2016 - 8th EAI International Conference on Mobile Computing, Applications and Services · December 1, 2016 Today’s lifelogging devices capture images periodically without considering what data is important to users. Due to their small form factors and limited battery capacities, these lifeloggers are bound to miss important data either because they record at a ... Full text Cite

Differentially private regression diagnostics

Conference Proceedings - IEEE International Conference on Data Mining, ICDM · July 2, 2016 Linear and logistic regression are popular statistical techniques for analyzing multi-variate data. Typically, analysts do not simply posit a particular form of the regression model, estimate its parameters, and use the results for inference orprediction. ... Full text Cite

Principled evaluation of differentially private algorithms using DPBENCH

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · June 26, 2016 Differential privacy has become the dominant standard in the research community for strong privacy protection. There has been a flood of research into query answering algorithms that meet this standard. Algorithms are becoming increasingly complex, and in ... Full text Cite

Exploring privacy-accuracy tradeoffs using DPComp

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · June 26, 2016 The emergence of differential privacy as a primary standard for privacy protection has led to the development, by the research community, of hundreds of algorithms for various data analysis tasks. Yet deployment of these techniques has been slowed by the c ... Full text Cite

What you mark is what apps see

Conference MobiSys 2016 - Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services · June 20, 2016 Users are increasingly vulnerable to inadvertently leaking sensitive information through cameras. In this paper, we investigate an approach to mitigating the risk of such inadvertent leaks called privacy markers. Privacy markers give users fine-grained con ... Full text Cite

Design of policy-aware differentially private algorithms

Chapter · January 1, 2016 The problem of designing error optimal differentially private algorithms is well studied. Recent work applying differential privacy to real world settings have used variants of differential privacy that appropriately modify the notion of neighboring databa ... Cite

Designing statistical privacy for your data

Journal Article Communications of the ACM · March 1, 2015 Preparing data for public release requires significant attention to fundamental principles of privacy. If a privacy definition is chosen wisely by the data curator, the sensitive information will be protected. Algorithms that satisfy the spec are called pr ... Full text Cite

DPT: Differentially private trajectory synthesis using hierarchical reference systems

Chapter · January 1, 2015 GPS-enabled devices are now ubiquitous, from airplanes and cars to smartphones and wearable technology. This has resulted in a wealth of data about the movements of individuals and populations, which can be analyzed for useful information to aid in city an ... Cite

A demonstration of VisDPT: Visual exploration of differentially private trajectories

Conference Proceedings of the VLDB Endowment · January 1, 2015 The release of detailed taxi trips has motivated numerous useful studies, but has also triggered multiple privacy attacks on individuals' trips. Despite these attacks, no tools are available for systematically analyzing the privacy risk of released traject ... Full text Cite

Differential privacy in the wild: A tutorial on current practices and open challenges

Conference Proceedings of the VLDB Endowment · January 1, 2015 Differential privacy has emerged as an important standard for privacy preserving computation over databases containing sensitive information about individuals. Research on differential privacy spanning a number of research areas, including theory, security ... Full text Cite

Privacy preserving interactive record linkage (PPIRL).

Journal Article Journal of the American Medical Informatics Association : JAMIA · March 2014 ObjectiveRecord linkage to integrate uncoordinated databases is critical in biomedical research using Big Data. Balancing privacy protection against the need for high quality record linkage requires a human-machine hybrid system to safely manage u ... Full text Cite

Pufferfish: A framework for mathematical privacy definitions

Journal Article ACM Transactions on Database Systems · January 1, 2014 In this article, we introduce a new and general privacy framework called Pufferfish. The Pufferfish framework can be used to create new privacy definitions that are customized to the needs of a given application. The goal of Pufferfish is to allow experts ... Full text Cite

Social genome: Putting big data to work for population informatics

Journal Article Computer · January 1, 2014 Data-intensive research using distributed, federated, person-level datasets in near real time has the potential to transform social, behavioral, economic, and health sciences - but issues around privacy, confidentiality, access, and data integration have s ... Full text Cite

Blowfish privacy: Tuning privacy-utility trade-offs using policies

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2014 Privacy definitions provide ways for trading-off the privacy of individuals in a statistical database for the utility of downstream analysis of the data. In this paper, we present Blowfish, a class of privacy definitions inspired by the Pufferfish framewor ... Full text Cite

MarkIt: Privacy markers for protecting visual secrets

Journal Article UbiComp 2014 - Adjunct Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing · January 1, 2014 The increasing popularity of wearable devices that continuously capture video, and the prevalence of third-party applications that utilize these feeds have resulted in a new threat to privacy. In many situations, sensitive objects/regions are maliciously ( ... Full text Cite

Finding connected components in map-reduce in logarithmic rounds

Journal Article Proceedings - International Conference on Data Engineering · August 15, 2013 Given a large graph G = (V, E) with millions of nodes and edges, how do we compute its connected components efficiently? Recent work addresses this problem in map-reduce, where a fundamental trade-off exists between the number of map-reduce rounds and the ... Full text Cite

Curso: Protect yourself from curse of attribute inference: A social network privacy-analyzer

Journal Article Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks, DBSocial 2013 · July 26, 2013 While social networking platforms allow users to control how their private information is shared, recent research has shown that a user's sensitive attribute can be inferred based on friendship links and group memberships, even when the attribute value is ... Cite

Sparsi: Partitioning sensitive data amongst multiple adversaries

Journal Article Proceedings of the VLDB Endowment · January 1, 2013 We present SPARSI, a novel theoretical framework for partitioning sensitive data across multiple non-colluding adversaries. Most work in privacy-aware data sharing has considered disclosing summaries where the aggregate information about the data is preser ... Full text Cite

Scalable Social Coordination with Group Constraints using Enmeshed Queries

Conference CIDR 2013 - 6th Biennial Conference on Innovative Data Systems Research · January 1, 2013 © 2013 Conference on Innovative Data Systems Research (CIDR). All rights reserved. While specific forms of social coordination appear in tools such as Meetup and in game platforms such as XBox LIVE, we introduce a more general model using what we call enme ... Cite

Scalable Social Coordination with Group Constraints using Enmeshed Queries

Conference CIDR 2013 - 6th Biennial Conference on Innovative Data Systems Research · January 1, 2013 While specific forms of social coordination appear in tools such as Meetup and in game platforms such as XBox LIVE, we introduce a more general model using what we call enmeshed queries. An enmeshed query allows users to declaratively specify an intent to ... Cite

An automatic blocking mechanism for large-scale de-duplication tasks

Journal Article ACM International Conference Proceeding Series · December 19, 2012 De-duplication - identification of distinct records referring to the same real-world entity - is a well-known challenge in data integration. Since very large datasets prohibit the comparison of every pair of records, blocking has been identified as a techn ... Full text Cite

Challenges in enabling social application at scale clouddb'12 invited-keynote talk

Journal Article International Conference on Information and Knowledge Management, Proceedings · December 10, 2012 Internet users spend billions of minutes per month on so- cial networking sites like Facebook, LinkedIn and Twitter. Not only do they create tons of data everyday in the form of posts, tweets and photos, the connections between users have given rise to new ... Full text Cite

A rigorous and customizable framework for privacy

Journal Article Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems · June 26, 2012 In this paper we introduce a new and general privacy framework called Pufferfish. The Pufferfish framework can be used to create new privacy definitions that are customized to the needs of a given application. The goal of Pufferfish is to allow experts in ... Full text Cite

Information integration over time in unreliable and uncertain environments

Journal Article WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web · May 16, 2012 Often an interesting true value such as a stock price, sports score, or current temperature is only available via the observations of noisy and potentially conflicting sources. Several techniques have been proposed to reconcile these conflicts by computing ... Full text Cite

Publishing search logs - A comparative study of privacy guarantees

Journal Article IEEE Transactions on Knowledge and Data Engineering · February 6, 2012 Search engine companies collect the database of intentions, the histories of their users' search queries. These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing search logs in order not to disclose sensi ... Full text Cite

An analysis of structured data on the web

Journal Article Proceedings of the VLDB Endowment · January 1, 2012 In this paper, we analyze the nature and distribution of structured data on the Web. Web-scale information extraction, or the problem of creating structured tables using extraction from the entire web, is gathering lots of research interest. We perform a s ... Full text Cite

Entity resolution: Theory, practice & open challenges

Journal Article Proceedings of the VLDB Endowment · January 1, 2012 This tutorial brings together perspectives on ER from a variety of fields, including databases, machine learning, natural language processing and information retrieval, to provide, in one setting, a survey of a large body of work. We discuss both the pract ... Full text Cite

Preface

Journal Article Proceedings - IEEE International Conference on Data Mining, ICDM · December 1, 2011 Full text Cite

Highly efficient algorithms for structural clustering of large websites

Journal Article Proceedings of the 20th International Conference on World Wide Web, WWW 2011 · December 1, 2011 In this paper, we present a highly scalable algorithm for structurally clustering webpages for extraction. We show that, using only the URLs of the webpages and simple content features, it is possible to cluster webpages effectively and efficiently. At the ... Full text Cite

Collective extraction from heterogeneous web lists

Journal Article Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011 · March 14, 2011 Automatic extraction of structured records from inconsistently formatted lists on the web is challenging: different lists present disparate sets of attributes with variations in the ordering of attributes; many lists contain additional attributes and noise ... Full text Cite

Load balancing and range queries in P2P systems using P-ring

Journal Article ACM Transactions on Internet Technology · March 1, 2011 In peer-to-peer (P2P) systems, computers from around the globe share data and can participate in distributed computation. P2P became famous, and infamous, due to file-sharing systems like Napster. However, the scalability and robustness of these systems ma ... Full text Cite

Personalized social recommendations accurate or private?

Journal Article Proceedings of the VLDB Endowment · January 1, 2011 With the recent surge of social networks such as Facebook, new forms of recommendations have become possible - recommendationsthat rely on one's social connections in orderto make personalized recommendations of ads, content, products, and people. Since re ... Full text Cite

Sampling hidden objects using nearest-neighbor oracles

Journal Article Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining · January 1, 2011 Given an unknown set of objects embedded in the Euclidean plane and a nearest-neighbor oracle, how to estimate the set size and other properties of the objects? In this paper we address this problem. We propose an efficient method that uses the Voronoi par ... Full text Cite

Feed following: The big data challenge in social applications

Journal Article Workshop on Databases and Social Networks, DBSocial'11 · January 1, 2011 Internet users spend billions of minutes per month on sites like Facebook and Twitter. These sites support feed following, where users "follow" activity streams associated with other users and entities. Followers get personalized feeds that blend streams p ... Full text Cite

No free lunch in data privacy

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2011 Differential privacy is a powerful tool for providing privacy-preserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is freque ... Full text Cite

Privacy in data publishing

Journal Article Proceedings - International Conference on Data Engineering · June 2, 2010 This tutorial gives an overview of techniques for releasing data about individuals while preserving privacy. © 2010 IEEE. ... Full text Cite

Privacy-Preserving Data Publishing

Journal Article Foundations and Trends in Databases · December 31, 2009 Privacy is an important issue when one wants to make use of data that involves individuals' sensitive information. Research on protecting the privacy of individuals and the confidentiality of data has received contributions from many fields, including comp ... Full text Cite

Data Publishing against Realistic Adversaries

Journal Article Proceedings of the VLDB Endowment · January 1, 2009 Privacy in data publishing has received much attention recently. The key to defining privacy is to model knowledge of the attacker - if the attacker is assumed to know too little, the published data can be easily attacked, if the attacker is assumed to kno ... Full text Cite

Privacy: Theory meets practice on the map

Journal Article Proceedings - International Conference on Data Engineering · October 1, 2008 In this paper, we propose the first formal privacy analysis of a data anonymization process known as the synthetic data generation, a technique becoming popular in the statistics community. The target application for this work is a mapping program that sho ... Full text Cite

Scalable ranked publish/subscribe

Journal Article Proceedings of the VLDB Endowment · January 1, 2008 Publish/subscribe (pub/sub) systems are designed to efficiently match incoming events (e.g., stock quotes) against a set of subscriptions (e.g., trader profiles specifying quotes of interest). However, current pub/sub systems only support a simple binary n ... Full text Cite

Methodology for thermal aware topologies and partitioning with better lateral spreading

Journal Article Proceedings of the International Conference on Microelectronics, ICM · 2008 As the temperature became a first class design metric due to increased power densities, one of the key challenges in deep sub-micron technologies is to guarantee thermal safety while minimizing the performance impact. This highlights the need for thermal-a ... Full text Cite

P-ring: An efficient and robust P2P range index structure

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · October 30, 2007 Peer-to-peer systems have emerged as a robust, scalable and decentralized way to share and publish data. In this paper, we propose P-Ring, a new P2P index structure that supports both equality and range queries. P-Ring is fault-tolerant, provides logarithm ... Full text Cite

Worst-case background knowledge for privacy-preserving data publishing

Journal Article Proceedings - International Conference on Data Engineering · September 24, 2007 Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what background knowledge the attacker possesses. Thus, it is impo ... Full text Cite

On the efficiency of checking perfect privacy

Journal Article Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems · December 1, 2006 Privacy-preserving query-answering systems answer queries while provably guaranteeing that sensitive information is kept secret. One very attractive notion of privacy is perfect privacy-a secret is expressed through a query QS, and a query QV is answered o ... Full text Cite

ℓ-Diversity: Privacy beyond k-anonymity

Journal Article Proceedings - International Conference on Data Engineering · October 17, 2006 Publishing data about individuals without revealing sensitive information about them is an important problem. In recent y ears, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishabl ... Full text Cite

Trusted CVS

Conference ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops · January 1, 2006 The CVS (Concurrent Versions System) software is a popular method for recording modifications to data objects, in addition to concurrent access to data in a multi-user environment. In current implementations, all users have to trust that the CVS server per ... Full text Cite

A storage and indexing framework for P2P systems

Journal Article Thirteenth International World Wide Web Conference Proceedings, WWW2004 · December 1, 2004 We present a modularized storage and indexing framework that cleanly separates the functional components of a P2P system, enabling us to tailor the P2P infrastructure to the specific needs of various Internet applications. ... Cite

A storage and indexing framework for P2P systems

Conference Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004 · May 19, 2004 We present a modularized storage and indexing framework that cleanly separates the functional components of a P2P system, enabling us to tailor the P2P infrastructure to the specific needs of various Internet applications. ... Full text Cite

An indexing framework for peer-to-peer systems

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2004 Full text Cite

On perfectly secure communication over arbitrary networks

Journal Article Proceedings of the Annual ACM Symposium on Principles of Distributed Computing · December 1, 2002 We study the interplay of network connectivity and perfectly secure message transmission under the corrupting influence of generalized Byzantine adversaries. It is known that in the threshold adversary model, where the Byzantine adversary can corrupt upto ... Cite

Preface

Journal Article Clinics Atlas of Office Procedures · December 1, 2001 Cite

A security assurance framework for component based software development

Journal Article Informatica (Ljubljana) · November 1, 2001 Commercial-off-the-shelf (COTS) components are black box software products. The absence of their code precludes them from any kind of inspection of certify that the code is safe. This increases the security risk for safety-sensitive applications. The appli ... Cite

Diversity!

Journal Article Utah Law Review · 1992 Cite