• Selected Publications • All Sorted by Date • All Classified by Publication Type •
L. Kraemer and B. Banerjee. Reinforcement Learning of Informed Initial Policies for Decentralized Planning. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 9(4):18:1–18:32, ACM Press, 2014.
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based—limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.
@Article{Kraemer14:Reinforcement, author = {L. Kraemer and B. Banerjee}, title = {Reinforcement Learning of Informed Initial Policies for Decentralized Planning}, journal = {ACM Transactions on Autonomous and Adaptive Systems (TAAS)}, year = {2014}, volume = {9}, number = {4}, pages = {18:1--18:32}, publisher = {ACM Press}, abstract = {Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based—limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.}, }
Generated by bib2html.pl (written by Patrick Riley ) on Sat May 29, 2021 15:48:22