Repairing Databases over Metric Spaces with Coincidence Constraints

Youri Kaminsky, Benny Kimelfeld, Ester Livshits, Felix Naumann, David Wajc

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Datasets often contain values that naturally reside in a metric space: numbers, strings, geographical locations, machine-learned embeddings in a vector space, and so on. We study the computational complexity of repairing inconsistent databases that violate integrity constraints, where the database values belong to an underlying metric space. The goal is to update the database values to retain consistency while minimizing the total distance between the original values and the repaired ones. We consider what we refer to as coincidence constraints, which include unary key constraints, inclusion constraints, foreign keys, and generally any restriction on the relationship between the numbers of cells of different labels (attributes) coinciding in a single value, for a fixed attribute set. We begin by showing that the problem is APX-hard for general metric spaces. We then present an algorithm solving the problem optimally for tree metrics, which generalize both the line metric (i.e., where repaired values are numbers) and the discrete metric (i.e., where we simply count the number of changed values). Combining our algorithm for tree metrics and a classic result on probabilistic tree embeddings, we design a (high probability) logarithmic-ratio approximation for general metrics. We also study the variant of the problem where we limit the allowed change of each individual value. In this variant, it is already NP-complete to decide the existence of any legal repair for a general metric, and we present a polynomial-time repairing algorithm for the case of a line metric.

Original languageEnglish
Title of host publication28th International Conference on Database Theory, ICDT 2025
EditorsSudeepa Roy, Ahmet Kara
ISBN (Electronic)9783959773645
DOIs
StatePublished - 21 Mar 2025
Event28th International Conference on Database Theory, ICDT 2025 - Barcelona, Spain
Duration: 25 Mar 202528 Mar 2025

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume328

Conference

Conference28th International Conference on Database Theory, ICDT 2025
Country/TerritorySpain
CityBarcelona
Period25/03/2528/03/25

Keywords

  • coincidence constraints
  • Database repairs
  • foreign-key constraints
  • inclusion constraints
  • metric spaces

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'Repairing Databases over Metric Spaces with Coincidence Constraints'. Together they form a unique fingerprint.

Cite this