Sudeepa Roy

Before joining Duke, I was a postdoctoral research associate in the Department of Computer Science and Engineering,University of Washington where I worked with Prof. Dan Suciu and the database group.

I graduated from the University of Pennsylvania with a Ph.D. in Computer and Information Science where I was advised by Prof. Susan Davidson and Prof. Sanjeev Khanna. During my Ph.D., I did two internships at IBM Research, Almaden,and received a Google PhD fellowship in Structured Data in 2011.

I obtained my master's and bachelor's degrees in Computer Science from Indian Institute of Technology, Kanpur and Jadavpur University respectively.

Research Interests I am broadly interested in data and information management with a focus on foundational aspects of big data analysis. My research objective is to help users with heterogenous backgrounds and interests leverage the maximum benefit from the available data. While my ongoing work on explanations in databases directly aims to assist users get deep insights into data by providing rich explanations to their questions, my work in the areas of data and workow provenance, probabilistic databases, and crowd-sourcing probes into compelling, fundamental questions that need to be answered to enable end-to-end processing and analysis of unstructured, noisy, and unreliable data in today's world while preserving its entire context.

Current Appointments & Affiliations

Associate Professor of Computer Science · 2022 - Present Computer Science, Trinity College of Arts & Sciences

Recent Publications

Refining Labeling Functions with Limited Labeled Data

Conference Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining · August 3, 2025 Programmatic weak supervision (PWS) significantly reduces human effort for labeling data by combining the outputs of user-provided labeling functions (LFs) on unlabeled datapoints. However, the quality of the generated labels depends directly on the accura ... Full text Cite

CauSumX: Summarized Causal Explanations For Group-By-Average Queries

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · June 22, 2025 Group-by-average SQL queries are a cornerstone of data analysis, often employed to uncover patterns and trends within datasets. However, interpreting the results of these queries can be challenging and time-intensive, particularly when working with large, ... Full text Cite

Differentially private explanations for aggregate query answers

Journal Article VLDB Journal · March 1, 2025 Differential privacy (DP) is the state-of-the-art and rigorous notion of privacy for answering aggregate database queries while preserving the privacy of sensitive information in the data. In today’s era of data analysis, however, it poses new challenges f ... Full text Cite

View All Publications

Education, Training & Certifications

University of Pennsylvania · 2012 Ph.D.

External Links

Website