Old IR Methods Meet RAG

Oz Huly, Idan Pogrebinsky, David Carmel, Oren Kurland, Yoelle Maarek

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Retrieval augmented generation (RAG) is an important approach to provide large language models (LLMs) with context pertaining to the text generation task: given a prompt, passages are retrieved from external corpora to ground the generation with more relevant and/or fresher data. Most previous studies used dense retrieval methods for applying RAG in question answering scenarios. However, recent work showed that traditional information retrieval methods (a.k.a. sparse methods) can do as well as or even better than dense retrieval ones. In particular, it was shown that Okapi BM25 outperforms dense retrieval methods, in terms of perplexity, for the fundamental text completion task in LLMs. We extend this study and show, using two popular LLMs, that a broad set of sparse retrieval methods achieve better results than all the dense retrieval methods we experimented with, for varying lengths of queries induced from the prompt. Furthermore, we found that Okapi BM25 is substantially outperformed by a term-proximity retrieval method (MRF), which is in turn outperformed by a pseudo-feedback-based bag-of-terms approach (relevance model). Additional exploration sheds some light on the effectiveness of lexical retrieval methods for RAG. Our findings call for further study of classical retrieval methods for RAG.

Original languageEnglish
Title of host publicationSIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages2559-2563
Number of pages5
ISBN (Electronic)9798400704314
DOIs
StatePublished - 10 Jul 2024
Event47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024 - Washington, United States
Duration: 14 Jul 202418 Jul 2024

Publication series

NameSIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024
Country/TerritoryUnited States
CityWashington
Period14/07/2418/07/24

Keywords

  • retrieval augmented generation

ASJC Scopus subject areas

  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Old IR Methods Meet RAG'. Together they form a unique fingerprint.

Cite this