Skip to main content

Ursprung: Provenance for large-scale analytics environments

Publication ,  Conference
Rupprecht, L; Lubbock, A; Davis, JC; Tyson, D; Arnold, C; Bhagwat, D
Published in: Proceedings of the ACM SIGMOD International Conference on Management of Data
June 25, 2019

Modern analytics has produced wonders, but reproducing and verifying these wonders is difficult. Data provenance helps to solve this problem by collecting information on how data is created and accessed. Although provenance collection techniques have been used successfully on a smaller scale, tracking provenance in large-scale analytics environments is challenging due to the scale of provenance generated and the heterogeneous domains. Without provenance, analysts struggle to keep track of and reproduce their analyses. We demonstrate Ursprung1, a provenance collection system specifically targeted at such environments. Ursprung transparently collects the minimal set of system-level provenance required to track the relationships between data and processes. To collect domain specific provenance, Ursprung enables users to specify capture rules to curate application-specific logs, intermediate results etc. To reduce storage overhead and accelerate queries, it uses event hierarchies to synthesize raw provenance into compact summaries.

Duke Scholars

Published In

Proceedings of the ACM SIGMOD International Conference on Management of Data

DOI

ISSN

0730-8078

Publication Date

June 25, 2019

Start / End Page

1989 / 1992
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Rupprecht, L., Lubbock, A., Davis, J. C., Tyson, D., Arnold, C., & Bhagwat, D. (2019). Ursprung: Provenance for large-scale analytics environments. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1989–1992). https://doi.org/10.1145/3299869.3320235
Rupprecht, L., A. Lubbock, J. C. Davis, D. Tyson, C. Arnold, and D. Bhagwat. “Ursprung: Provenance for large-scale analytics environments.” In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1989–92, 2019. https://doi.org/10.1145/3299869.3320235.
Rupprecht L, Lubbock A, Davis JC, Tyson D, Arnold C, Bhagwat D. Ursprung: Provenance for large-scale analytics environments. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2019. p. 1989–92.
Rupprecht, L., et al. “Ursprung: Provenance for large-scale analytics environments.” Proceedings of the ACM SIGMOD International Conference on Management of Data, 2019, pp. 1989–92. Scopus, doi:10.1145/3299869.3320235.
Rupprecht L, Lubbock A, Davis JC, Tyson D, Arnold C, Bhagwat D. Ursprung: Provenance for large-scale analytics environments. Proceedings of the ACM SIGMOD International Conference on Management of Data. 2019. p. 1989–1992.

Published In

Proceedings of the ACM SIGMOD International Conference on Management of Data

DOI

ISSN

0730-8078

Publication Date

June 25, 2019

Start / End Page

1989 / 1992