Logical provenance in data-oriented workflows?

R. Ikeda, Akash Das Sarma, J. Widom

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Scopus citations

Abstract

We consider the problem of defining, generating, and tracing provenance in data-oriented workflows, in which input data sets are processed by a graph of transformations to produce output results. We first give a new general definition of provenance for general transformations, introducing the notions of correctness, precision, and minimality. We then determine when properties such as correctness and minimality carry over from the individual transformations' provenance to the workflow provenance. We describe a simple logical-provenance specification language consisting of attribute mappings and filters. We provide an algorithm for provenance tracing in workflows where logical provenance for each transformation is specified using our language. We consider logical provenance in the relational setting, observing that for a class of Select-Project-Join (SPJ) transformations, logical provenance specifications encode minimal provenance. We have built a prototype system supporting the features and algorithms presented in the paper, and we report a few preliminary experimental results. © 2013 IEEE.
Original languageEnglish (US)
Title of host publication2013 IEEE 29th International Conference on Data Engineering (ICDE)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages877-888
Number of pages12
ISBN (Print)9781467349109
DOIs
StatePublished - Apr 2013
Externally publishedYes

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was supported by the National Science Foundation (IIS-0904497), the Boeing Corporation, and a KAUST research grant.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.

Fingerprint

Dive into the research topics of 'Logical provenance in data-oriented workflows?'. Together they form a unique fingerprint.

Cite this