Towards hypothetical reasoning using distributed provenance

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hypothetical reasoning is the iterative examination of the effect of modifications to the data on the result of some computation or data analysis query. This kind of reasoning is commonly performed by data scientists to gain insights. Previous work has indicated that fine-grained data provenance can be instrumental for the efficient performance of hypothetical reasoning: instead of a costly re-execution of the underlying application, one may assign values to a pre-computed provenance expression. However, current techniques for fine-grained provenance tracking are ill-suited for large-scale data due to the overhead they entail on both execution time and memory consumption. We outline an approach for hypothetical reasoning for large-scale data. Our key insights are: (i) tracking only relevant parts of the provenance based on an a priori specification of classes of hypothetical scenarios that are of interest and (ii) the distributed tracking of provenance tailored to fit distributed data processing frameworks such as Apache Spark. We also discuss the challenges in both respects and our initial directions for addressing them.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2018
Subtitle of host publication21st International Conference on Extending Database Technology, Proceedings
EditorsMichael Bohlen, Reinhard Pichler, Norman May, Erhard Rahm, Shan-Hung Wu, Katja Hose
Pages461-464
Number of pages4
ISBN (Electronic)9783893180783
DOIs
StatePublished - 1 Jan 2018
Event21st International Conference on Extending Database Technology, EDBT 2018 - Vienna, Austria
Duration: 26 Mar 201829 Mar 2018

Publication series

NameAdvances in Database Technology - EDBT
Volume2018-March

Conference

Conference21st International Conference on Extending Database Technology, EDBT 2018
Country/TerritoryAustria
CityVienna
Period26/03/1829/03/18

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Towards hypothetical reasoning using distributed provenance'. Together they form a unique fingerprint.

Cite this