UnivLogo

Bikramjit Banerjee's Publications

Selected PublicationsAll Sorted by DateAll Classified by Publication Type

Generalized Multiagent Learning with Performance Bound

Bikramjit Banerjee and Jing Peng. Generalized Multiagent Learning with Performance Bound. Autonomous Agents and Multiagent Systems, 15(3):281–312, Springer, 2007.

Download

[PDF] 

Abstract

We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and “other” (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV σ(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against eventually stationary opponents—a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV σ(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV σ(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly, our analysis is in continuous time. Moreover, experiments show that RV σ(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge, this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game.

BibTeX

@Article{Banerjee07:Generalized,
  author = 	 {Bikramjit Banerjee and Jing Peng},
  title = 	 {Generalized Multiagent Learning with Performance Bound},


  journal = 	 {Autonomous Agents and Multiagent Systems},
  year = 	 {2007},
  volume = 	 {15},
  number = 	 {3},
  pages = 	 {281--312},
  publisher = {Springer},
  abstract = {We present new Multiagent learning (MAL) algorithms with the
   general philosophy of policy convergence against some classes of
   opponents but otherwise ensuring high payoffs. We consider a 3-class
   breakdown of opponent types: (eventually) stationary, self-play and
   “other” (see Definition 4) agents. We start with ReDVaLeR that can
   satisfy policy convergence against the first two types and no-regret
   against the third, but it needs to know the type of the opponents. This
   serves as a baseline to delineate the difficulty of achieving these
   goals. We show that a simple modification on ReDVaLeR yields a new
   algorithm, RV  σ(t), that achieves no-regret payoffs in all games, and
   convergence to Nash equilibria in self-play (and to best response
   against eventually stationary opponents—a corollary of no-regret)
   simultaneously, without knowing the opponent types, but in a smaller
   class of games than ReDVaLeR . RV  σ(t) effectively ensures the
   performance of a learner during the process of learning, as opposed to
   the performance of a learned behavior. We show that the expression for
   regret of RV  σ(t) can have a slightly better form than those of other
   comparable algorithms like GIGA and GIGA-WoLF though, contrastingly,
   our analysis is in continuous time. Moreover, experiments show that RV 
   σ(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF
   would fail, and to better equilibria where GIGA, GIGA-WoLF converge to
   undesirable equilibria (coordination games). This important class of
   coordination games also highlights the key desirability of policy
   convergence as a criterion for MAL in self-play instead of high average
   payoffs. To our knowledge, this is also the first successful
   (guaranteed) attempt at policy convergence of a no-regret algorithm in
   the Shapley game.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Sat May 29, 2021 15:48:22