A study of the integration of passage-, document-, and cluster-based information for re-ranking search results

Eyal Krikon, Oren Kurland

Research output: Contribution to journalArticlepeer-review

Abstract

Cluster-based and passage-based document retrieval paradigms were shown to be effective. While the former are based on utilizing query-related corpus context manifested in clusters of similar documents, the latter address the fact that a document can be relevant even if only a very small part of it contains query-pertaining information. Hence, cluster-based approaches could be viewed as based on "expanding" the document representation, while passage-based approaches can be thought of as utilizing a "contracted" document representation. We present a study of the relative benefits of using each of these two approaches, and of the potential merits of their integration. To that end, we devise two methods that integrate whole-document-based, cluster-based and passage-based information. The methods are applied for the re-ranking task, that is, re-ordering documents in an initially retrieved list so as to improve precision at the very top ranks. Extensive empirical evaluation attests to the potential merits of integrating these information types. Specifically, the resultant performance substantially transcends that of the initial ranking; and, is often better than that of a state-of-the-art pseudo-feedback-based query expansion approach.

Original languageEnglish
Pages (from-to)593-616
Number of pages24
JournalInformation Retrieval
Volume14
Issue number6
DOIs
StatePublished - Dec 2011

Keywords

  • Ad hoc retrieval
  • Cluster-based language models
  • Clusters
  • Passage-based language models
  • Passages
  • Re-ranking

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'A study of the integration of passage-, document-, and cluster-based information for re-ranking search results'. Together they form a unique fingerprint.

Cite this