Predicting Fact Contributions from Query Logs with Machine Learning

Dana Arad, Daniel Deutch, Nave Frost

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


A recent line of work has proposed to quantify the contribution of database tuples to query answers using Shapley values, a game theoretic function that has been extensively used as means of attribution in other areas, notably Machine Learning. In this paper we analyze and evaluate LearnShapley, a solution that employs Machine Learning to rank input facts based on their estimated (Shapley-based) contribution to query answers. LearnShapley is trained on a corpus of SPJU queries, their output and the Shapley values of each input tuple with respect to each output tuple. At inference time, LearnShapley is given a new SPJU query over the same database schema, an output tuple of interest, and its lineage (i.e. the set of all facts that have contributed in some way to the generation of the tuple). Our experiments evaluate to what extent LearnShapley is able to leverage similarity measures applied to the query in hand and the queries stored in the repository, to compute a ranking of the facts in the lineage based on their contribution. Overall, our experiments indicate that a log of past queries, output tuples and their Shapley values includes a reasonably relevant signal for predicting the ranking of facts contributions for a new SPJU query over the same database. Both DBShap and our code are publicly available, and may serve for further investigation of Machine Learning approaches for explainability in databases.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT
Number of pages13
ISBN (Electronic)9783893180912, 9783893180943, 9783893180950
StatePublished - 18 Mar 2024
Event27th International Conference on Extending Database Technology, EDBT 2024 - Paestum, Italy
Duration: 25 Mar 202428 Mar 2024

Publication series

NameAdvances in Database Technology - EDBT


Conference27th International Conference on Extending Database Technology, EDBT 2024

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software
  • Computer Science Applications


Dive into the research topics of 'Predicting Fact Contributions from Query Logs with Machine Learning'. Together they form a unique fingerprint.

Cite this