TY - GEN
T1 - Heterogeneous Data Integration by Learning to Rerank Schema Matches
AU - Gal, Avigdor
AU - Roitman, Haggai
AU - Shraga, Roee
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/27
Y1 - 2018/12/27
N2 - Schema matching is a task at the heart of integrating heterogeneous structured and semi-structured data with applications in data warehousing, process matching, data analysis recommendations, Web table matching, etc. Schema matching is known to be an uncertain process and a common method of overcoming this uncertainty is by introducing a human expert with a ranked list of possible schema matches from which the expert may choose, known as top-K matching. In this work we propose a learning algorithm that utilizes an innovative set of features to rerank a list of schema matches and improves upon the ranking of the best match. The proposed algorithm assists the matching process by introducing a quality set of alternative matches to a human expert. It also serves as a step towards eliminating the involvement of human experts as decision makers in a matching process altogether. A large scale empirical evaluation with real-world benchmark shows the effectiveness of the proposed algorithmic solution.
AB - Schema matching is a task at the heart of integrating heterogeneous structured and semi-structured data with applications in data warehousing, process matching, data analysis recommendations, Web table matching, etc. Schema matching is known to be an uncertain process and a common method of overcoming this uncertainty is by introducing a human expert with a ranked list of possible schema matches from which the expert may choose, known as top-K matching. In this work we propose a learning algorithm that utilizes an innovative set of features to rerank a list of schema matches and improves upon the ranking of the best match. The proposed algorithm assists the matching process by introducing a quality set of alternative matches to a human expert. It also serves as a step towards eliminating the involvement of human experts as decision makers in a matching process altogether. A large scale empirical evaluation with real-world benchmark shows the effectiveness of the proposed algorithmic solution.
KW - Heterogeneous data integration
KW - Learning to rerank
KW - Schema matching
KW - Uncertainty
UR - http://www.scopus.com/inward/record.url?scp=85061389354&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2018.00118
DO - 10.1109/ICDM.2018.00118
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85061389354
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 959
EP - 964
BT - 2018 IEEE International Conference on Data Mining, ICDM 2018
T2 - 18th IEEE International Conference on Data Mining, ICDM 2018
Y2 - 17 November 2018 through 20 November 2018
ER -