TY - JOUR
T1 - Utilizing passage-based language models for ad hoc document retrieval
AU - Bendersky, Michael
AU - Kurland, Oren
N1 - Funding Information:
Acknowledgments This paper is based upon work done in part while the first author was at the Technion and the second author was at Cornell University. The work presented here was supported in part by Google’s and IBM’s faculty research awards, by the Center for Intelligent Information Retrieval, and by the National Science Foundation under grant no. IIS-0329064. Any opinions, findings and conclusions or recommendations expressed in this material are the authors’ and do not necessarily reflect those of the sponsoring institutions.
PY - 2010/4
Y1 - 2010/4
N2 - To cope with the fact that, in the ad hoc retrieval setting, documents relevant to a query could contain very few (short) parts (passages) with query-related information, researchers proposed passage-based document ranking approaches. We show that several of these retrieval methods can be understood, and new ones can be derived, using the same probabilistic model. We use language-model estimates to instantiate specific retrieval algorithms, and in doing so present a novel passage language model that integrates information from the containing document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we present yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; these relevance models also outperform a document-based relevance model. Finally, we demonstrate the merits in using the document-homogeneity measures for integrating document-query and passage-query similarity information for document retrieval.
AB - To cope with the fact that, in the ad hoc retrieval setting, documents relevant to a query could contain very few (short) parts (passages) with query-related information, researchers proposed passage-based document ranking approaches. We show that several of these retrieval methods can be understood, and new ones can be derived, using the same probabilistic model. We use language-model estimates to instantiate specific retrieval algorithms, and in doing so present a novel passage language model that integrates information from the containing document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we present yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; these relevance models also outperform a document-based relevance model. Finally, we demonstrate the merits in using the document-homogeneity measures for integrating document-query and passage-query similarity information for document retrieval.
KW - Ad hoc document retrieval
KW - Document homogeneity
KW - Passage-based language models
KW - Passage-based relevance models
KW - Relevance models
UR - http://www.scopus.com/inward/record.url?scp=77954088608&partnerID=8YFLogxK
U2 - 10.1007/s10791-009-9118-8
DO - 10.1007/s10791-009-9118-8
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:77954088608
SN - 1386-4564
VL - 13
SP - 157
EP - 187
JO - Information Retrieval
JF - Information Retrieval
IS - 2
ER -