Explaining Reinforcement Learning Agents through Counterfactual Action Outcomes

Yotam Amitai; Yael Septon; Ofra Amir

doi:10.1609/aaai.v38i9.28863

Explaining Reinforcement Learning Agents through Counterfactual Action Outcomes

Yotam Amitai, Yael Septon, Ofra Amir

Data and Decision Sciences

Research output: Contribution to journal › Conference article › peer-review

Abstract

Explainable reinforcement learning (XRL) methods aim to help elucidate agent policies and decision-making processes. The majority of XRL approaches focus on local explanations, seeking to shed light on the reasons an agent acts the way it does at a specific world state. While such explanations are both useful and necessary, they typically do not portray the outcomes of the agent's selected choice of action. In this work, we propose “COViz”, a new local explanation method that visually compares the outcome of an agent's chosen action to a counterfactual one. In contrast to most local explanations that provide state-limited observations of the agent's motivation, our method depicts alternative trajectories the agent could have taken from the given state and their outcomes. We evaluated the usefulness of COViz in supporting people's understanding of agents' preferences and compare it with reward decomposition, a local explanation method that describes an agent's expected utility for different actions by decomposing it into meaningful reward types. Furthermore, we examine the complementary benefits of integrating both methods. Our results show that such integration significantly improved participants' performance.

Original language	English
Pages (from-to)	10003-10011
Number of pages	9
Journal	Proceedings of the AAAI Conference on Artificial Intelligence
Volume	38
Issue number	9
DOIs	https://doi.org/10.1609/aaai.v38i9.28863
State	Published - 25 Mar 2024
Event	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada Duration: 20 Feb 2024 → 27 Feb 2024

ASJC Scopus subject areas

Artificial Intelligence

Access to Document

10.1609/aaai.v38i9.28863

Cite this

@article{c2a489cf51df422a85c9665eb3a51065,

title = "Explaining Reinforcement Learning Agents through Counterfactual Action Outcomes",

abstract = "Explainable reinforcement learning (XRL) methods aim to help elucidate agent policies and decision-making processes. The majority of XRL approaches focus on local explanations, seeking to shed light on the reasons an agent acts the way it does at a specific world state. While such explanations are both useful and necessary, they typically do not portray the outcomes of the agent's selected choice of action. In this work, we propose “COViz”, a new local explanation method that visually compares the outcome of an agent's chosen action to a counterfactual one. In contrast to most local explanations that provide state-limited observations of the agent's motivation, our method depicts alternative trajectories the agent could have taken from the given state and their outcomes. We evaluated the usefulness of COViz in supporting people's understanding of agents' preferences and compare it with reward decomposition, a local explanation method that describes an agent's expected utility for different actions by decomposing it into meaningful reward types. Furthermore, we examine the complementary benefits of integrating both methods. Our results show that such integration significantly improved participants' performance.",

author = "Yotam Amitai and Yael Septon and Ofra Amir",

note = "Publisher Copyright: Copyright {\textcopyright} 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i9.28863",

language = "אנגלית",

volume = "38",

pages = "10003--10011",

number = "9",

}

TY - JOUR

T1 - Explaining Reinforcement Learning Agents through Counterfactual Action Outcomes

AU - Amitai, Yotam

AU - Septon, Yael

AU - Amir, Ofra

PY - 2024/3/25

Y1 - 2024/3/25

N2 - Explainable reinforcement learning (XRL) methods aim to help elucidate agent policies and decision-making processes. The majority of XRL approaches focus on local explanations, seeking to shed light on the reasons an agent acts the way it does at a specific world state. While such explanations are both useful and necessary, they typically do not portray the outcomes of the agent's selected choice of action. In this work, we propose “COViz”, a new local explanation method that visually compares the outcome of an agent's chosen action to a counterfactual one. In contrast to most local explanations that provide state-limited observations of the agent's motivation, our method depicts alternative trajectories the agent could have taken from the given state and their outcomes. We evaluated the usefulness of COViz in supporting people's understanding of agents' preferences and compare it with reward decomposition, a local explanation method that describes an agent's expected utility for different actions by decomposing it into meaningful reward types. Furthermore, we examine the complementary benefits of integrating both methods. Our results show that such integration significantly improved participants' performance.

AB - Explainable reinforcement learning (XRL) methods aim to help elucidate agent policies and decision-making processes. The majority of XRL approaches focus on local explanations, seeking to shed light on the reasons an agent acts the way it does at a specific world state. While such explanations are both useful and necessary, they typically do not portray the outcomes of the agent's selected choice of action. In this work, we propose “COViz”, a new local explanation method that visually compares the outcome of an agent's chosen action to a counterfactual one. In contrast to most local explanations that provide state-limited observations of the agent's motivation, our method depicts alternative trajectories the agent could have taken from the given state and their outcomes. We evaluated the usefulness of COViz in supporting people's understanding of agents' preferences and compare it with reward decomposition, a local explanation method that describes an agent's expected utility for different actions by decomposing it into meaningful reward types. Furthermore, we examine the complementary benefits of integrating both methods. Our results show that such integration significantly improved participants' performance.

UR - http://www.scopus.com/inward/record.url?scp=85189297737&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i9.28863

DO - 10.1609/aaai.v38i9.28863

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???

AN - SCOPUS:85189297737

SN - 2159-5399

VL - 38

SP - 10003

EP - 10011

JO - Proceedings of the AAAI Conference on Artificial Intelligence

JF - Proceedings of the AAAI Conference on Artificial Intelligence

IS - 9

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

Y2 - 20 February 2024 through 27 February 2024

ER -

Explaining Reinforcement Learning Agents through Counterfactual Action Outcomes

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this