UnivLogo

Bikramjit Banerjee's Publications

Selected PublicationsAll Sorted by DateAll Classified by Publication Type

Reactivity and Safe Learning in Multiagent Systems

Bikramjit Banerjee and Jing Peng. Reactivity and Safe Learning in Multiagent Systems. Adaptive Behavior, 14(4):339–356, SAGE, 2006.

Download

[PDF] 

Abstract

Multi-agent reinforcement learning (MRL) is a growing area of research. What makes it particularly challenging is that multiple learners render each other's environments non-stationary. In addition to adapting their behaviors to other learning agents, online learners must also provide assurances about their online performance in order to promote user trust of adaptive agent systems deployed in real world applications. In this article, instead of developing new algorithms with such assurances, we study the question of safety in online performance of some existing MRL algorithms. We identify the key notion of reactivity of a learner by analyzing how an algorithm (PHC-Exploiter), designed to exploit some simpler opponents, can itself be exploited by them. We quantify and analyze this concept of reactivity in the context of these algorithms to explain their experimental behaviors. We argue that no learner can be designed that can deliberately avoid exploitation. We also show that any attempt to to noise, and devise an adaptive method (based on environmental feedback) designed to maximize the learner's safety and minimize its sensitivity to noise.

BibTeX

@Article{Banerjee06:Reactivity,
  author = 	 {Bikramjit Banerjee and Jing Peng},
  title = 	 {Reactivity and Safe Learning in Multiagent Systems},


  journal = 	 {Adaptive Behavior},
  year = 	 {2006},
  volume = 	 {14},
  number = 	 {4},
  pages = 	 {339--356},
  publisher = {SAGE},
  abstract = {Multi-agent reinforcement learning (MRL) is a growing area
   of research. What makes it particularly challenging is that multiple
   learners render each other's   environments non-stationary. In addition
   to adapting their behaviors to other learning agents, online learners
   must also provide assurances about their online   performance in order
   to promote user trust of adaptive agent systems deployed in real world
   applications. In this article, instead of developing new algorithms
   with   such assurances, we study the question of safety in online
   performance of some existing MRL algorithms. We identify the key notion
   of reactivity of a learner by   analyzing how an algorithm 
   (PHC-Exploiter), designed to exploit some simpler opponents, can itself
   be exploited by them. We quantify and analyze this concept of 
   reactivity in the context of these algorithms to explain their
   experimental behaviors. We argue that no learner can be designed that
   can deliberately avoid exploitation. We also show that any attempt to
   to noise, and devise an adaptive method (based on environmental
   feedback) designed to maximize the learner's safety and minimize its
   sensitivity to noise.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Sat May 29, 2021 15:48:22