QOCO: A Query Oriented Data Cleaning System with Oracles

Moria Bergman, Tova Milo, Slava Novgorodov, Wang Chiew Tan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As key decisions are often made based on information contained in a database, it is important for the database to be as complete and correct as possible. For this reason, many data cleaning tools have been developed to automatically resolve inconsistencies in databases. However, data cleaning tools provide only best-effort results and usually cannot eradicate all errors that may exist in a database. Even more importantly, existing data cleaning tools do not typically address the problem of determining what information is missing from a database. To tackle these problems, we present QOCO, a novel query oriented cleaning system that leverages materialized views that are defined by user queries as a trigger for identifying the remaining incorrect/missing information. Given a user query, QOCO interacts with domain experts (which we model as oracle crowds) to identify potentially wrong or missing answers in the result of the user query, as well as determine and correct the wrong data that is the cause for the error(s). We will demonstrate QOCO over a World Cup Games database, and illustrate the interaction between QOCO and the oracles. Our demo audience will play the role of oracles, and we show how QOCO's underlying operations and optimization mechanisms can effectively prune the search space and minimize the number of questions that need to be posed to accelerate the cleaning process.

Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment
EditorsSimonas Saltenis, Christophe Claramunt, Ki-Joune Li
PublisherAssociation for Computing Machinery
Pages1900-1903
Number of pages4
Volume8
Edition12 12
DOIs
StatePublished - 2015
Event3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, Korea, Republic of
Duration: 11 Sep 200611 Sep 2006

Conference

Conference3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006
Country/TerritoryKorea, Republic of
CitySeoul
Period11/09/0611/09/06

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'QOCO: A Query Oriented Data Cleaning System with Oracles'. Together they form a unique fingerprint.

Cite this