UnivLogo

Bikramjit Banerjee's Publications

Selected PublicationsAll Sorted by DateAll Classified by Publication Type

Multi-agent reinforcement learning as a rehearsal for decentralized planning

Landon Kraemer and Bikramjit Banerjee. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, Elsevier, 2016.

Download

[PDF] 

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Multi-agent reinforcement learning (MARL) based approaches have been recently proposed for distributed solution of Dec-POMDPs without full prior knowledge of the model, but these methods assume that conditions during learning and policy execution are identical. In some practical scenarios this may not be the case. We propose a novel MARL approach in which agents are allowed to rehearse with information that will not be available during policy execution. The key is for the agents to learn policies that do not explicitly rely on these rehearsal features. We also establish a weak convergence result for our algorithm, RLaR, demonstrating that RLaR converges in probability when certain conditions are met. We show experimentally that incorporating rehearsal features can enhance the learning rate compared to non-rehearsal-based learners, and demonstrate fast, (near) also compare RLaR against an existing approximate Dec-POMDP solver which, like RLaR, does not assume a priori knowledge of the model. While RLaR׳s policy representation is not as scalable, we show that RLaR produces higher quality policies for most problems and horizons studied.

BibTeX

@Article{Kraemer16:Multi-agent,
  author = 	 {Landon Kraemer and Bikramjit Banerjee},
  title = 	 {Multi-agent reinforcement learning as a rehearsal for


                 decentralized planning},
  journal = 	 {Neurocomputing},
  year = 	 {2016},
  volume = 	 {190},
  pages = 	 {82--94},
  publisher =    {Elsevier},
  abstract =     {Decentralized partially observable Markov decision
  processes (Dec-POMDPs) are a powerful tool for modeling multi-agent
  planning and decision-making under uncertainty. Prevalent Dec-POMDP
  solution techniques require centralized computation given full knowledge
  of the underlying model. Multi-agent reinforcement learning (MARL) based
   approaches have been recently proposed for distributed solution of 
   Dec-POMDPs without full prior knowledge of the model, but these methods
   assume that conditions during learning and policy execution are
   identical. In some practical scenarios this may not be the case. We
   propose a novel MARL approach in which agents are allowed to rehearse
   with information that will not be available during policy execution.
   The key is for the agents to learn policies that do not explicitly rely
   on these rehearsal features. We also establish a weak convergence
   result for our algorithm, RLaR, demonstrating that RLaR converges in
   probability when certain conditions are met. We show experimentally
   that incorporating rehearsal features can enhance the learning rate
   compared to non-rehearsal-based learners, and demonstrate fast, (near)
   also compare RLaR against an existing approximate Dec-POMDP solver
   which, like RLaR, does not assume a priori knowledge of the model.
   While RLaR׳s policy representation is not as scalable, we show that
   RLaR produces higher quality policies for most problems and horizons
   studied.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Sat May 29, 2021 15:48:22