Abstract
The ad hoc retrieval task is to find documents in a corpus that are relevant to a query. Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural reranking approach to ad-hoc retrieval that applies to settings with no hyperlink information. We reorder the documents in an initially retrieved set by exploiting implicit asymmetric relationships among them. We consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another. We study a number of reranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite effective at improving precision at top ranks; the best resultant performance is comparable, and often superior, to that of a state-of-the-art pseudo-feedback-based retrieval approach. In addition, we demonstrate the merits of our language-model-based method for inducing interdocument links by comparing it to previously suggested notions of interdocument similarities (e.g., cosines within the vector-space model). We also show that our methods for inducing centrality are substantially more effective than approaches based on document-specific characteristics, several of which are novel to this study.
Original language | English |
---|---|
Article number | 18 |
Journal | ACM Transactions on Information Systems |
Volume | 28 |
Issue number | 4 |
DOIs | |
State | Published - Nov 2010 |
Externally published | Yes |
Keywords
- Authorities
- Graph-based retrieval
- High-accuracy retrieval
- HITS
- Hubs
- Language modeling
- PageRank
- Social networks
- Structural reranking
ASJC Scopus subject areas
- Information Systems
- General Business, Management and Accounting
- Computer Science Applications