Utilizing passage-based language models for ad hoc document retrieval

Michael Bendersky, Oren Kurland

Research output: Contribution to journalArticlepeer-review

Abstract

To cope with the fact that, in the ad hoc retrieval setting, documents relevant to a query could contain very few (short) parts (passages) with query-related information, researchers proposed passage-based document ranking approaches. We show that several of these retrieval methods can be understood, and new ones can be derived, using the same probabilistic model. We use language-model estimates to instantiate specific retrieval algorithms, and in doing so present a novel passage language model that integrates information from the containing document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we present yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; these relevance models also outperform a document-based relevance model. Finally, we demonstrate the merits in using the document-homogeneity measures for integrating document-query and passage-query similarity information for document retrieval.

Original languageEnglish
Pages (from-to)157-187
Number of pages31
JournalInformation Retrieval
Volume13
Issue number2
DOIs
StatePublished - Apr 2010

Keywords

  • Ad hoc document retrieval
  • Document homogeneity
  • Passage-based language models
  • Passage-based relevance models
  • Relevance models

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Utilizing passage-based language models for ad hoc document retrieval'. Together they form a unique fingerprint.

Cite this