Learning to Rerank Schema Matches

Avigdor Gal, Haggai Roitman, Roee Shraga

Research output: Contribution to journalArticlepeer-review

Abstract

Schema matching is at the heart of integrating structured and semi-structured data with applications in data warehousing, data analysis recommendations, Web table matching, etc. Schema matching is known as an uncertain process and a common method to overcome this uncertainty introduces a human expert with a ranked list of possible schema matches to choose from, known as top-$K$K matching. In this work we propose a learning algorithm that utilizes an innovative set of features to rerank a list of schema matches and improves upon the ranking of the best match. We provide a bound on the size of an initial match list, tying the number of matches with a desired level of confidence in finding the best match. We also propose the use of matching predictors as features in a learning task, and tailored nine new matching predictors for this purpose. The proposed algorithm assists the matching process by introducing a quality set of alternative matches to a human expert. It also serves as a step towards eliminating the involvement of human experts as decision makers in a matching process altogether. A large scale empirical evaluation with real-world benchmark shows the effectiveness of the proposed algorithmic solution.

Original languageEnglish
Article number8944172
Pages (from-to)3104-3116
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume33
Issue number8
DOIs
StatePublished - 1 Aug 2021

Keywords

  • Schema matching
  • data integration
  • learning to rerank
  • uncertainty

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Learning to Rerank Schema Matches'. Together they form a unique fingerprint.

Cite this