TY - JOUR
T1 - A study of the integration of passage-, document-, and cluster-based information for re-ranking search results
AU - Krikon, Eyal
AU - Kurland, Oren
N1 - Funding Information:
Acknowledgments We thank the reviewers for their helpful comments. This paper is based upon work supported in part by Israel’s Science Foundation under grant no. 890015, and by IBM’s SUR award. Any
PY - 2011/12
Y1 - 2011/12
N2 - Cluster-based and passage-based document retrieval paradigms were shown to be effective. While the former are based on utilizing query-related corpus context manifested in clusters of similar documents, the latter address the fact that a document can be relevant even if only a very small part of it contains query-pertaining information. Hence, cluster-based approaches could be viewed as based on "expanding" the document representation, while passage-based approaches can be thought of as utilizing a "contracted" document representation. We present a study of the relative benefits of using each of these two approaches, and of the potential merits of their integration. To that end, we devise two methods that integrate whole-document-based, cluster-based and passage-based information. The methods are applied for the re-ranking task, that is, re-ordering documents in an initially retrieved list so as to improve precision at the very top ranks. Extensive empirical evaluation attests to the potential merits of integrating these information types. Specifically, the resultant performance substantially transcends that of the initial ranking; and, is often better than that of a state-of-the-art pseudo-feedback-based query expansion approach.
AB - Cluster-based and passage-based document retrieval paradigms were shown to be effective. While the former are based on utilizing query-related corpus context manifested in clusters of similar documents, the latter address the fact that a document can be relevant even if only a very small part of it contains query-pertaining information. Hence, cluster-based approaches could be viewed as based on "expanding" the document representation, while passage-based approaches can be thought of as utilizing a "contracted" document representation. We present a study of the relative benefits of using each of these two approaches, and of the potential merits of their integration. To that end, we devise two methods that integrate whole-document-based, cluster-based and passage-based information. The methods are applied for the re-ranking task, that is, re-ordering documents in an initially retrieved list so as to improve precision at the very top ranks. Extensive empirical evaluation attests to the potential merits of integrating these information types. Specifically, the resultant performance substantially transcends that of the initial ranking; and, is often better than that of a state-of-the-art pseudo-feedback-based query expansion approach.
KW - Ad hoc retrieval
KW - Cluster-based language models
KW - Clusters
KW - Passage-based language models
KW - Passages
KW - Re-ranking
UR - http://www.scopus.com/inward/record.url?scp=80255131400&partnerID=8YFLogxK
U2 - 10.1007/s10791-011-9168-6
DO - 10.1007/s10791-011-9168-6
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:80255131400
SN - 1386-4564
VL - 14
SP - 593
EP - 616
JO - Information Retrieval
JF - Information Retrieval
IS - 6
ER -