Principal-Agent Reward Shaping in MDPs

Omer Ben-Porat, Yishay Mansour, Michal Moshkovitz, Boaz Taitler

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest. The economic literature has extensively studied principal-agent problems, and recent work has extended this to more complex scenarios such as Markov Decision Processes (MDPs). In this paper, we further explore this line of research by investigating how reward shaping under budget constraints can improve the principal's utility. We study a two-player Stackelberg game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players. The principal offers an additional reward to the agent, and the agent picks their policy selfishly to maximize their reward, which is the sum of the original and the offered reward. Our results establish the NP-hardness of the problem and offer polynomial approximation algorithms for two classes of instances: Stochastic trees and deterministic decision processes with a finite horizon.

Original languageEnglish
Title of host publicationTechnical Tracks 14
EditorsMichael Wooldridge, Jennifer Dy, Sriraam Natarajan
Pages9502-9510
Number of pages9
Edition9
ISBN (Electronic)1577358872, 9781577358879
DOIs
StatePublished - 25 Mar 2024
Event38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada
Duration: 20 Feb 202427 Feb 2024

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
Number9
Volume38

Conference

Conference38th AAAI Conference on Artificial Intelligence, AAAI 2024
Country/TerritoryCanada
CityVancouver
Period20/02/2427/02/24

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Principal-Agent Reward Shaping in MDPs'. Together they form a unique fingerprint.

Cite this