Contribution Maximization in Probabilistic Datalog

Tova Milo, Yuval Moskovitch, Brit Youngmann

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The use of probabilistic datalog programs has been recently advocated for applications that involve recursive computation and uncertainty. While using such programs allows for a flexible knowledge derivation, it makes the analysis of query results a challenging task. Particularly, given a set O of output tuples and a number k, one would like to understand which k-size subset of the input tuples have contributed the most to the derivation of O. This is useful for multiple tasks, such as identifying the critical sources of errors and understanding surprising results. Previous works have mainly focused on the quantification of tuples contribution to a query result in non-recursive SQL queries, very often disregarding probabilistic inference. To quantify the contribution in probabilistic datalog programs, one must account for the recursive relations between input and output data, and the uncertainty. To this end, we formalize the Contribution Maximization (CM) problem. We then reduce CM to the well-studied Influence Maximization (IM) problem, showing that we can harness techniques developed for IM to our setting. However, we show that such naïve adoption results in poor performance. To overcome this, we propose an optimized algorithm which injects a refined variant of the classic Magic Sets technique, integrated with a sampling method, into IM algorithms, achieving a significant saving of space and execution time. Our experiments demonstrate the effectiveness of our algorithm, even where the naïve approach is infeasible.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
PublisherIEEE Computer Society
Pages817-828
Number of pages12
ISBN (Electronic)9781728129037
DOIs
StatePublished - 1 Apr 2020
Event36th IEEE International Conference on Data Engineering, ICDE 2020 - Dallas, United States
Duration: 20 Apr 202024 Apr 2020

Publication series

NameProceedings - International Conference on Data Engineering
Volume2020-April

Conference

Conference36th IEEE International Conference on Data Engineering, ICDE 2020
Country/TerritoryUnited States
CityDallas
Period20/04/2024/04/20

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'Contribution Maximization in Probabilistic Datalog'. Together they form a unique fingerprint.

Cite this