Explaining Agent Preferences & Behavior: Integrating Reward-Decomposition & Contrastive-Highlights

Yael Septon; Yotam Amitai; Ofra Amir

Explaining Agent Preferences & Behavior: Integrating Reward-Decomposition & Contrastive-Highlights

Yael Septon, Yotam Amitai, Ofra Amir

Data and Decision Sciences

Research output: Contribution to journal › Conference article › peer-review

Abstract

Explainable reinforcement learning methods aim to help elucidate agent policies and their underlying decision-making processes. One such method is reward decomposition, which aims to reveal an agent's preferences in a specific world-state by presenting its expected utility decomposed to different components of the reward function. While this approach quantifies the expected decomposed rewards for alternative actions, it does not demonstrate the outcomes of these alternative actions in terms of the behavior of the agent. This work introduces “Contrastive Highlights”, a novel local explanation method that visually compares the agent's chosen behavior to an alternative choice of action in a contrastive manner. We conducted user studies comparing participants' understanding of agents' preferences based on either reward decomposition, contrastive highlights, or a combination of both approaches. Our results show that integrating reward decomposition with contrastive highlights significantly improved participants' performance compared to using each of the approaches separately.

Original language	English
Pages (from-to)	2295-2297
Number of pages	3
Journal	Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume	2023-May
State	Published - 2023
Event	22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023 - London, United Kingdom Duration: 29 May 2023 → 2 Jun 2023

Keywords

Deep Reinforcement Learning
Explainable AI
Human-AI Interaction

ASJC Scopus subject areas

Artificial Intelligence
Software
Control and Systems Engineering

Cite this

@article{8594a70867e347f394dcd3a4cdeeab53,

title = "Explaining Agent Preferences & Behavior: Integrating Reward-Decomposition & Contrastive-Highlights",

abstract = "Explainable reinforcement learning methods aim to help elucidate agent policies and their underlying decision-making processes. One such method is reward decomposition, which aims to reveal an agent's preferences in a specific world-state by presenting its expected utility decomposed to different components of the reward function. While this approach quantifies the expected decomposed rewards for alternative actions, it does not demonstrate the outcomes of these alternative actions in terms of the behavior of the agent. This work introduces “Contrastive Highlights”, a novel local explanation method that visually compares the agent's chosen behavior to an alternative choice of action in a contrastive manner. We conducted user studies comparing participants' understanding of agents' preferences based on either reward decomposition, contrastive highlights, or a combination of both approaches. Our results show that integrating reward decomposition with contrastive highlights significantly improved participants' performance compared to using each of the approaches separately.",

keywords = "Deep Reinforcement Learning, Explainable AI, Human-AI Interaction",

author = "Yael Septon and Yotam Amitai and Ofra Amir",

note = "Publisher Copyright: {\textcopyright} 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.; 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023 ; Conference date: 29-05-2023 Through 02-06-2023",

year = "2023",

language = "אנגלית",

volume = "2023-May",

pages = "2295--2297",

}

TY - JOUR

T1 - Explaining Agent Preferences & Behavior

T2 - 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023

AU - Septon, Yael

AU - Amitai, Yotam

AU - Amir, Ofra

PY - 2023

Y1 - 2023

N2 - Explainable reinforcement learning methods aim to help elucidate agent policies and their underlying decision-making processes. One such method is reward decomposition, which aims to reveal an agent's preferences in a specific world-state by presenting its expected utility decomposed to different components of the reward function. While this approach quantifies the expected decomposed rewards for alternative actions, it does not demonstrate the outcomes of these alternative actions in terms of the behavior of the agent. This work introduces “Contrastive Highlights”, a novel local explanation method that visually compares the agent's chosen behavior to an alternative choice of action in a contrastive manner. We conducted user studies comparing participants' understanding of agents' preferences based on either reward decomposition, contrastive highlights, or a combination of both approaches. Our results show that integrating reward decomposition with contrastive highlights significantly improved participants' performance compared to using each of the approaches separately.

AB - Explainable reinforcement learning methods aim to help elucidate agent policies and their underlying decision-making processes. One such method is reward decomposition, which aims to reveal an agent's preferences in a specific world-state by presenting its expected utility decomposed to different components of the reward function. While this approach quantifies the expected decomposed rewards for alternative actions, it does not demonstrate the outcomes of these alternative actions in terms of the behavior of the agent. This work introduces “Contrastive Highlights”, a novel local explanation method that visually compares the agent's chosen behavior to an alternative choice of action in a contrastive manner. We conducted user studies comparing participants' understanding of agents' preferences based on either reward decomposition, contrastive highlights, or a combination of both approaches. Our results show that integrating reward decomposition with contrastive highlights significantly improved participants' performance compared to using each of the approaches separately.

KW - Deep Reinforcement Learning

KW - Explainable AI

KW - Human-AI Interaction

UR - http://www.scopus.com/inward/record.url?scp=85171280065&partnerID=8YFLogxK

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???

AN - SCOPUS:85171280065

SN - 1548-8403

VL - 2023-May

SP - 2295

EP - 2297

JO - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

JF - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

Y2 - 29 May 2023 through 2 June 2023

ER -

Explaining Agent Preferences & Behavior: Integrating Reward-Decomposition & Contrastive-Highlights

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this