Heterogeneous Data Integration by Learning to Rerank Schema Matches

Avigdor Gal, Haggai Roitman, Roee Shraga

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Schema matching is a task at the heart of integrating heterogeneous structured and semi-structured data with applications in data warehousing, process matching, data analysis recommendations, Web table matching, etc. Schema matching is known to be an uncertain process and a common method of overcoming this uncertainty is by introducing a human expert with a ranked list of possible schema matches from which the expert may choose, known as top-K matching. In this work we propose a learning algorithm that utilizes an innovative set of features to rerank a list of schema matches and improves upon the ranking of the best match. The proposed algorithm assists the matching process by introducing a quality set of alternative matches to a human expert. It also serves as a step towards eliminating the involvement of human experts as decision makers in a matching process altogether. A large scale empirical evaluation with real-world benchmark shows the effectiveness of the proposed algorithmic solution.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Data Mining, ICDM 2018
Pages959-964
Number of pages6
ISBN (Electronic)9781538691588
DOIs
StatePublished - 27 Dec 2018
Event18th IEEE International Conference on Data Mining, ICDM 2018 - Singapore, Singapore
Duration: 17 Nov 201820 Nov 2018

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2018-November
ISSN (Print)1550-4786

Conference

Conference18th IEEE International Conference on Data Mining, ICDM 2018
Country/TerritorySingapore
CitySingapore
Period17/11/1820/11/18

Keywords

  • Heterogeneous data integration
  • Learning to rerank
  • Schema matching
  • Uncertainty

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Heterogeneous Data Integration by Learning to Rerank Schema Matches'. Together they form a unique fingerprint.

Cite this