Abstract
Web search engines present, for some queries, a cluster of results from the same specialized domain (“vertical”) on the search results page (SERP). We introduce a comprehensive analysis of the presentation of such clusters from seven different verticals based on the logs of a commercial Web search engine. This analysis reveals several unique characteristics—such as size, rank, and clicks—of result clusters from community question- and-answer websites. The study of properties of this result cluster—specifically as part of the SERP—has received little attention in previous work. Our analysis also motivates the pursuit of a long-standing challenge in ad hoc retrieval, namely, selective cluster retrieval. In our setting, the specific challenge is to select for presentation the documents most highly ranked either by a cluster-based approach (those in the top-retrieved cluster) or by a document-based approach. We address this classification task by representing queries with features based on those utilized for ranking the clusters, query-performance predictors, and properties of the document-clustering structure. Empirical evaluation performed with TREC data shows that our approach outperforms a recently proposed state-of-the-art cluster-based document-retrieval method as well as state-of-the-art document-retrieval methods that do not account for inter-document similarities.
Original language | English |
---|---|
Article number | 28 |
Journal | ACM Transactions on Information Systems |
Volume | 36 |
Issue number | 3 |
DOIs | |
State | Published - 2018 |
Keywords
- Aggregated search
- Cluster-based retrieval
ASJC Scopus subject areas
- Information Systems
- General Business, Management and Accounting
- Computer Science Applications