Abstract
Schema matching is at the heart of integrating structured and semi-structured data with applications in data warehousing, data analysis recommendations, Web table matching, etc. Schema matching is known as an uncertain process and a common method to overcome this uncertainty introduces a human expert with a ranked list of possible schema matches to choose from, known as top-$K$K matching. In this work we propose a learning algorithm that utilizes an innovative set of features to rerank a list of schema matches and improves upon the ranking of the best match. We provide a bound on the size of an initial match list, tying the number of matches with a desired level of confidence in finding the best match. We also propose the use of matching predictors as features in a learning task, and tailored nine new matching predictors for this purpose. The proposed algorithm assists the matching process by introducing a quality set of alternative matches to a human expert. It also serves as a step towards eliminating the involvement of human experts as decision makers in a matching process altogether. A large scale empirical evaluation with real-world benchmark shows the effectiveness of the proposed algorithmic solution.
| Original language | English |
|---|---|
| Article number | 8944172 |
| Pages (from-to) | 3104-3116 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Knowledge and Data Engineering |
| Volume | 33 |
| Issue number | 8 |
| DOIs | |
| State | Published - 1 Aug 2021 |
Keywords
- Schema matching
- data integration
- learning to rerank
- uncertainty
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics