Multi-source uncertain entity resolution at Yad Vashem

Tomer Sagi, Avigdor Gal, Omer Barkol, Ruth Bergman, Alexander Avram

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this work we describe an entity resolution project performed at Yad Vashem, the central repository of Holocaustera information. The Yad Vashem dataset is unique with respect to classic entity resolution, by virtue of being both massively multi-source and by requiring multi-level entity resolution. With today's abundance of information sources, this project sets an example for multi-source resolution on a big-data scale. We discuss a set of requirements that led us to choose the MFIBlocks entity resolution algorithm in achieving the goals of the application. We also provide a machine learning approach, based upon decision trees to transform soft clusters into ranked clustering of records, representing possible entities. An extensive empirical evaluation demonstrates the unique properties of this dataset, highlighting the shortcomings of current methods and proposing avenues for future research in this realm.

Original languageEnglish
Title of host publicationSIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
Pages807-819
Number of pages13
ISBN (Electronic)9781450335317
DOIs
StatePublished - 26 Jun 2016
Event2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, United States
Duration: 26 Jun 20161 Jul 2016

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
Volume26-June-2016
ISSN (Print)0730-8078

Conference

Conference2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
Country/TerritoryUnited States
CitySan Francisco
Period26/06/161/07/16

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Multi-source uncertain entity resolution at Yad Vashem'. Together they form a unique fingerprint.

Cite this