Bikramjit Banerjee's Publications

• Selected Publications • All Sorted by Date • All Classified by Publication Type •

Generalized Multiagent Learning with Performance Bound

Bikramjit Banerjee and Jing Peng. Generalized Multiagent Learning with Performance Bound. Autonomous Agents and Multiagent Systems, 15(3):281–312, Springer, 2007.

Download

[PDF]

Abstract

We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and “other” (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV σ(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against eventually stationary opponents—a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV σ(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV σ(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly, our analysis is in continuous time. Moreover, experiments show that RV σ(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge, this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game.

BibTeX

@Article{Banerjee07:Generalized,
  author = 	 {Bikramjit Banerjee and Jing Peng},
  title = 	 {Generalized Multiagent Learning with Performance Bound},


  journal = 	 {Autonomous Agents and Multiagent Systems},
  year = 	 {2007},
  volume = 	 {15},
  number = 	 {3},
  pages = 	 {281--312},
  publisher = {Springer},
  abstract = {We present new Multiagent learning (MAL) algorithms with the
   general philosophy of policy convergence against some classes of
   opponents but otherwise ensuring high payoffs. We consider a 3-class
   breakdown of opponent types: (eventually) stationary, self-play and
   “other” (see Definition 4) agents. We start with ReDVaLeR that can
   satisfy policy convergence against the first two types and no-regret
   against the third, but it needs to know the type of the opponents. This
   serves as a baseline to delineate the difficulty of achieving these
   goals. We show that a simple modification on ReDVaLeR yields a new
   algorithm, RV  σ(t), that achieves no-regret payoffs in all games, and
   convergence to Nash equilibria in self-play (and to best response
   against eventually stationary opponents—a corollary of no-regret)
   simultaneously, without knowing the opponent types, but in a smaller
   class of games than ReDVaLeR . RV  σ(t) effectively ensures the
   performance of a learner during the process of learning, as opposed to
   the performance of a learned behavior. We show that the expression for
   regret of RV  σ(t) can have a slightly better form than those of other
   comparable algorithms like GIGA and GIGA-WoLF though, contrastingly,
   our analysis is in continuous time. Moreover, experiments show that RV 
   σ(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF
   would fail, and to better equilibria where GIGA, GIGA-WoLF converge to
   undesirable equilibria (coordination games). This important class of
   coordination games also highlights the key desirability of policy
   convergence as a criterion for MAL in self-play instead of high average
   payoffs. To our knowledge, this is also the first successful
   (guaranteed) attempt at policy convergence of a no-regret algorithm in
   the Shapley game.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Sat May 29, 2021 15:48:22