TY - GEN
T1 - Shame to be Sham
T2 - 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013
AU - Raiber, Fiana
AU - Collins-Thompson, Kevyn
AU - Kurland, Oren
PY - 2013
Y1 - 2013
N2 - We present an initial study identifying a form of content-based grey hat search engine optimization, in which a Web page contains both potentially relevant content and manipulated content: we call such pages sham documents, because they lie in the grey area between 'ham' (clearly normal) and 'spam' (clearly fake). Sham documents are often ranked ar-tifcially high in response to certain queries, but also may contain some useful information and cannot be considered as absolute spam. We report a novel annotation efort performed with the ClueWeb09 benchmark where pages were labeled as being spam, sham, or legitimate content. Sig-nifcant inter-annotator agreement rates support the claim that there are sham documents that are highly ranked by a very efective retrieval approach, yet are not spam. We also present an initial study of predictors that may indicate whether a query is the target of shamming.
AB - We present an initial study identifying a form of content-based grey hat search engine optimization, in which a Web page contains both potentially relevant content and manipulated content: we call such pages sham documents, because they lie in the grey area between 'ham' (clearly normal) and 'spam' (clearly fake). Sham documents are often ranked ar-tifcially high in response to certain queries, but also may contain some useful information and cannot be considered as absolute spam. We report a novel annotation efort performed with the ClueWeb09 benchmark where pages were labeled as being spam, sham, or legitimate content. Sig-nifcant inter-annotator agreement rates support the claim that there are sham documents that are highly ranked by a very efective retrieval approach, yet are not spam. We also present an initial study of predictors that may indicate whether a query is the target of shamming.
KW - Search engine optimization
KW - Sham
KW - Spam
UR - http://www.scopus.com/inward/record.url?scp=84883059358&partnerID=8YFLogxK
U2 - 10.1145/2484028.2484135
DO - 10.1145/2484028.2484135
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84883059358
SN - 9781450320344
T3 - SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 1013
EP - 1016
BT - SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
Y2 - 28 July 2013 through 1 August 2013
ER -