Shame to be Sham: Addressing content-based grey hat search engine optimization

Fiana Raiber, Kevyn Collins-Thompson, Oren Kurland

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present an initial study identifying a form of content-based grey hat search engine optimization, in which a Web page contains both potentially relevant content and manipulated content: we call such pages sham documents, because they lie in the grey area between 'ham' (clearly normal) and 'spam' (clearly fake). Sham documents are often ranked ar-tifcially high in response to certain queries, but also may contain some useful information and cannot be considered as absolute spam. We report a novel annotation efort performed with the ClueWeb09 benchmark where pages were labeled as being spam, sham, or legitimate content. Sig-nifcant inter-annotator agreement rates support the claim that there are sham documents that are highly ranked by a very efective retrieval approach, yet are not spam. We also present an initial study of predictors that may indicate whether a query is the target of shamming.

Original languageEnglish
Title of host publicationSIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages1013-1016
Number of pages4
DOIs
StatePublished - 2013
Event36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 - Dublin, Ireland
Duration: 28 Jul 20131 Aug 2013

Publication series

NameSIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013
Country/TerritoryIreland
CityDublin
Period28/07/131/08/13

Keywords

  • Search engine optimization
  • Sham
  • Spam

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Fingerprint

Dive into the research topics of 'Shame to be Sham: Addressing content-based grey hat search engine optimization'. Together they form a unique fingerprint.

Cite this