Bikramjit Banerjee's Publications

• Selected Publications • All Sorted by Date • All Classified by Publication Type •

Reinforcement Learning of Informed Initial Policies for Decentralized Planning

L. Kraemer and B. Banerjee. Reinforcement Learning of Informed Initial Policies for Decentralized Planning. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 9(4):18:1–18:32, ACM Press, 2014.

Download

[PDF]

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based—limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.

BibTeX

@Article{Kraemer14:Reinforcement,
  author = 	 {L. Kraemer and B. Banerjee},
  title = 	 {Reinforcement Learning of Informed Initial Policies for


                 Decentralized Planning},
  journal = 	 {ACM Transactions on Autonomous and Adaptive Systems
                 (TAAS)},
  year = 	 {2014},
  volume = 	 {9},
  number = 	 {4},
  pages = 	 {18:1--18:32},
  publisher =    {ACM Press},
  abstract =     {Decentralized partially observable Markov decision
   processes (Dec-POMDPs) offer a formal model for planning in cooperative
   multiagent systems where agents operate with noisy sensors and
   actuators, as well as local information. Prevalent solution techniques
   are centralized and model based—limitations that we address by
   distributed reinforcement learning (RL). We particularly favor
   alternate learning, where agents alternately learn best responses to
   each other, which appears to outperform concurrent RL. However,
   alternate learning requires an initial policy. We propose two
   principled approaches to generating informed initial policies: a naive
   approach that lays the foundation for a more sophisticated approach. We
   empirically demonstrate that the refined approach produces near-optimal
   solutions in many challenging benchmark settings, staking a claim to
   being an efficient (and realistic) approximate solver in its own right.
   Furthermore, alternate best response learning seeded with such policies
   quickly learns high-quality policies as well.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Sat May 29, 2021 15:48:22