UnivLogo

Bikramjit Banerjee's Publications

Selected PublicationsAll Sorted by DateAll Classified by Publication Type

Strategic best-response learning in multi-agent systems

Bikramjit Banerjee and Jing Peng. Strategic best-response learning in multi-agent systems. Journal of Experimental and Theoretical Artificial Intelligence, 24(2):139–160, Taylor and Francis, 2012.

Download

[PDF] 

Abstract

We present a novel and uniform formulation of the problem of reinforcement learning against bounded memory adaptive adversaries in repeated games, and the methodologies to accomplish learning in this novel framework. First we delineate a novel strategic definition of best response that optimises rewards over multiple steps, as opposed to the notion of tactical best response in game theory. We show that the problem of learning a strategic best response reduces to that of learning an optimal policy in a Markov Decision Process (MDP). We deal with both finite and infinite horizon versions of this problem. We adapt an existing Monte Carlo based algorithm for learning optimal policies in such MDPs over finite horizon, in polynomial time. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, simple experiments in the Prisoner's Dilemma, and coordination games show that even when no extra domain knowledge (besides that an upper bound on the opponent's memory size is known) is assumed, the error can still be small. We also experiment with a general infinite-horizon learner (using function-approximation to tackle the complexity of history space) against a greedy bounded memory opponent and show that while it can create and exploit opportunities of mutual cooperation in the Prisoner's Dilemma game, it is cautious enough to ensure minimax payoffs in the Rock-Scissors-Paper game.

BibTeX

@Article{Banerjee12:Strategic,
  author =       {Bikramjit Banerjee and Jing Peng},
  title =        {Strategic best-response learning in multi-agent systems},


  journal =      {Journal of Experimental and Theoretical Artificial
                 Intelligence},
  year =         {2012},
  volume =       {24},
  number =       {2},
  pages =        {139--160},
  publisher = {Taylor and Francis},
  abstract = {We present a novel and uniform formulation of the problem of
  reinforcement learning against bounded memory adaptive adversaries in
  repeated games, and the methodologies to accomplish learning in this
  novel framework. First we delineate a novel strategic definition of best
  response that optimises rewards over multiple steps, as opposed to the
  notion of tactical best response in game theory. We show that the
  problem of learning a strategic best response reduces to that of
  learning an optimal policy in a Markov Decision Process (MDP). We deal
  with both finite and infinite horizon versions of this problem. We adapt
  an existing Monte Carlo based algorithm for learning optimal policies in
  such MDPs over finite horizon, in polynomial time. We show that this new
  efficient algorithm can obtain higher average rewards than a previously
  known efficient algorithm against some opponents in the contract game.
  Though this improvement comes at the cost of increased domain knowledge,
  simple experiments in the Prisoner's Dilemma, and coordination games
  show that even when no extra domain knowledge (besides that an upper
  bound on the opponent's memory size is known) is assumed, the error can
  still be small. We also experiment with a general infinite-horizon
  learner (using function-approximation to tackle the complexity of
  history space) against a greedy bounded memory opponent and show that
  while it can create and exploit opportunities of mutual cooperation in
  the Prisoner's Dilemma game, it is cautious enough to ensure minimax
  payoffs in the Rock-Scissors-Paper game.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Sat May 29, 2021 15:48:22