Explaining Agent Preferences & Behavior: Integrating Reward-Decomposition & Contrastive-Highlights

Yael Septon, Yotam Amitai, Ofra Amir

Research output: Contribution to journalConference articlepeer-review

Abstract

Explainable reinforcement learning methods aim to help elucidate agent policies and their underlying decision-making processes. One such method is reward decomposition, which aims to reveal an agent's preferences in a specific world-state by presenting its expected utility decomposed to different components of the reward function. While this approach quantifies the expected decomposed rewards for alternative actions, it does not demonstrate the outcomes of these alternative actions in terms of the behavior of the agent. This work introduces “Contrastive Highlights”, a novel local explanation method that visually compares the agent's chosen behavior to an alternative choice of action in a contrastive manner. We conducted user studies comparing participants' understanding of agents' preferences based on either reward decomposition, contrastive highlights, or a combination of both approaches. Our results show that integrating reward decomposition with contrastive highlights significantly improved participants' performance compared to using each of the approaches separately.

Original languageEnglish
Pages (from-to)2295-2297
Number of pages3
JournalProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume2023-May
StatePublished - 2023
Event22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023 - London, United Kingdom
Duration: 29 May 20232 Jun 2023

Keywords

  • Deep Reinforcement Learning
  • Explainable AI
  • Human-AI Interaction

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Explaining Agent Preferences & Behavior: Integrating Reward-Decomposition & Contrastive-Highlights'. Together they form a unique fingerprint.

Cite this