TY - JOUR
T1 - Re-ranking search results using language models of query-specific clusters
AU - Kurland, Oren
N1 - Funding Information:
Acknowledgments The author thanks Lillian Lee for many valuable discussions and comments. Part of the work that is described in this paper was done while the author was at Cornell University. The paper is based upon work supported in part by the National Science Foundation under grant no. IIS-0329064 and by a research award from Google. Any opinions, findings, and conclusions or recommendations expressed are those of the author and do not necessarily reflect the views of any sponsoring institutions or the U.S. government.
PY - 2009/8
Y1 - 2009/8
N2 - To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and using information induced from these (often called) query-specific clusters for re-ranking the list. However, results concerning the effectiveness of various automatic cluster-based re-ranking methods have been inconclusive. We show that using query-specific clusters for automatic re-ranking of top-retrieved documents is effective with several methods in which clusters play different roles, among which is the smoothing of document language models. We do so by adapting previously-proposed cluster-based retrieval approaches, which are based on (static) query-independent clusters for ranking all documents in a corpus, to the re-ranking setting wherein clusters are query-specific. The best performing method that we develop outperforms both the initial document-based ranking and some previously proposed cluster-based re-ranking approaches; furthermore, this algorithm consistently outperforms a state-of-the-art pseudo-feedback-based approach. In further exploration we study the performance of cluster-based smoothing methods for re-ranking with various (soft and hard) clustering algorithms, and demonstrate the importance of clusters in providing context from the initial list through a comparison to using single documents to this end.
AB - To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and using information induced from these (often called) query-specific clusters for re-ranking the list. However, results concerning the effectiveness of various automatic cluster-based re-ranking methods have been inconclusive. We show that using query-specific clusters for automatic re-ranking of top-retrieved documents is effective with several methods in which clusters play different roles, among which is the smoothing of document language models. We do so by adapting previously-proposed cluster-based retrieval approaches, which are based on (static) query-independent clusters for ranking all documents in a corpus, to the re-ranking setting wherein clusters are query-specific. The best performing method that we develop outperforms both the initial document-based ranking and some previously proposed cluster-based re-ranking approaches; furthermore, this algorithm consistently outperforms a state-of-the-art pseudo-feedback-based approach. In further exploration we study the performance of cluster-based smoothing methods for re-ranking with various (soft and hard) clustering algorithms, and demonstrate the importance of clusters in providing context from the initial list through a comparison to using single documents to this end.
KW - Cluster-based language models
KW - Cluster-based re-ranking
KW - Cluster-based smoothing
KW - Query-specific clusters
UR - http://www.scopus.com/inward/record.url?scp=67651040478&partnerID=8YFLogxK
U2 - 10.1007/s10791-008-9065-9
DO - 10.1007/s10791-008-9065-9
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:67651040478
SN - 1386-4564
VL - 12
SP - 437
EP - 460
JO - Information Retrieval
JF - Information Retrieval
IS - 4
ER -