TY - GEN
T1 - Multi-source uncertain entity resolution at Yad Vashem
AU - Sagi, Tomer
AU - Gal, Avigdor
AU - Barkol, Omer
AU - Bergman, Ruth
AU - Avram, Alexander
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/6/26
Y1 - 2016/6/26
N2 - In this work we describe an entity resolution project performed at Yad Vashem, the central repository of Holocaustera information. The Yad Vashem dataset is unique with respect to classic entity resolution, by virtue of being both massively multi-source and by requiring multi-level entity resolution. With today's abundance of information sources, this project sets an example for multi-source resolution on a big-data scale. We discuss a set of requirements that led us to choose the MFIBlocks entity resolution algorithm in achieving the goals of the application. We also provide a machine learning approach, based upon decision trees to transform soft clusters into ranked clustering of records, representing possible entities. An extensive empirical evaluation demonstrates the unique properties of this dataset, highlighting the shortcomings of current methods and proposing avenues for future research in this realm.
AB - In this work we describe an entity resolution project performed at Yad Vashem, the central repository of Holocaustera information. The Yad Vashem dataset is unique with respect to classic entity resolution, by virtue of being both massively multi-source and by requiring multi-level entity resolution. With today's abundance of information sources, this project sets an example for multi-source resolution on a big-data scale. We discuss a set of requirements that led us to choose the MFIBlocks entity resolution algorithm in achieving the goals of the application. We also provide a machine learning approach, based upon decision trees to transform soft clusters into ranked clustering of records, representing possible entities. An extensive empirical evaluation demonstrates the unique properties of this dataset, highlighting the shortcomings of current methods and proposing avenues for future research in this realm.
UR - http://www.scopus.com/inward/record.url?scp=84979663914&partnerID=8YFLogxK
U2 - 10.1145/2882903.2903737
DO - 10.1145/2882903.2903737
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84979663914
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 807
EP - 819
BT - SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
T2 - 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
Y2 - 26 June 2016 through 1 July 2016
ER -