PageRank without hyperlinks: Structural reranking using links induced by language models

Research output: Contribution to journalArticlepeer-review

Abstract

The ad hoc retrieval task is to find documents in a corpus that are relevant to a query. Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural reranking approach to ad-hoc retrieval that applies to settings with no hyperlink information. We reorder the documents in an initially retrieved set by exploiting implicit asymmetric relationships among them. We consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another. We study a number of reranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite effective at improving precision at top ranks; the best resultant performance is comparable, and often superior, to that of a state-of-the-art pseudo-feedback-based retrieval approach. In addition, we demonstrate the merits of our language-model-based method for inducing interdocument links by comparing it to previously suggested notions of interdocument similarities (e.g., cosines within the vector-space model). We also show that our methods for inducing centrality are substantially more effective than approaches based on document-specific characteristics, several of which are novel to this study.

Original languageEnglish
Article number18
JournalACM Transactions on Information Systems
Volume28
Issue number4
DOIs
StatePublished - Nov 2010
Externally publishedYes

Keywords

  • Authorities
  • Graph-based retrieval
  • High-accuracy retrieval
  • HITS
  • Hubs
  • Language modeling
  • PageRank
  • Social networks
  • Structural reranking

ASJC Scopus subject areas

  • Information Systems
  • General Business, Management and Accounting
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'PageRank without hyperlinks: Structural reranking using links induced by language models'. Together they form a unique fingerprint.

Cite this