On provenance and privacy

Published

Other Article

Provenance in scientific workflows is a double-edged sword. On the one hand, recording information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, enables transparency and repro-ducibility of results. On the other hand, a scientific workflow often contains private or confidential data and uses proprietary modules. Hence, providing exact answers to provenance queries over all executions of the workflow may reveal private information. In this paper we discuss privacy concerns in scientific workflows - data, module, and structural privacy - and frame several natural questions: (i) Can we formally analyze data, module, and structural privacy, giving provable privacy guarantees for an unlimited/bounded number of provenance queries? (ii) How can we answer search and structural queries over repositories of workflow specifications and their executions, providing as much information as possible to the user while still guaranteeing privacy? We then highlight some recent work in this area and point to several directions for future work. © 2011 ACM.

Full Text

Duke Authors

Cited Authors

  • Davidson, SB; Khanna, S; Roy, S; Stoyanovich, J; Tannen, V; Chen, Y

Published Date

  • March 11, 2011

Published In

  • Acm International Conference Proceeding Series

Start / End Page

  • 3 - 10

International Standard Book Number 13 (ISBN-13)

  • 9781450305297

Digital Object Identifier (DOI)

  • 10.1145/1938551.1938554

Citation Source

  • Scopus