• Selected Publications • All Sorted by Date • All Classified by Publication Type •
Trung Nguyen and Bikramjit Banerjee. Reinforcement Learning as a Rehearsal for Swarm Foraging. Swarm Intelligence, 16(1):29–58, Springer, 2022.
Foraging in a swarm of robots has been investigated by many researchers, where the prevalent techniques have been hand designed algorithms with parameters often tuned via machine learning. Our departure point is one such algorithm, where we replace a handcoded decision procedure with reinforcement learning (RL), resulting in significantly superior performance. We situate our approach within the reinforcement learning as a rehearsal (RLaR) framework, that we have recently introduced. We instantiate RLaR for the foraging problem, and experimentally show that a key component of RLaR---a conditional probability distribution function---can be modeled as a uni-modal distribution (with a lower memory footprint) despite evidence that it is multi-modal. Our experiments also show that the learned behavior has some degree of scalability in terms of variations in the swarm size or the environment.
@Article{Nguyen22:Reinforcement, author = {Trung Nguyen and Bikramjit Banerjee}, title = {{Reinforcement Learning as a Rehearsal for Swarm Foraging}}, journal = {Swarm Intelligence}, year = {2022}, volume = {16}, number = {1}, pages = {29--58}, publisher = {Springer}, abstract = {Foraging in a swarm of robots has been investigated by many researchers, where the prevalent techniques have been hand designed algorithms with parameters often tuned via machine learning. Our departure point is one such algorithm, where we replace a handcoded decision procedure with reinforcement learning (RL), resulting in significantly superior performance. We situate our approach within the reinforcement learning as a rehearsal (RLaR) framework, that we have recently introduced. We instantiate RLaR for the foraging problem, and experimentally show that a key component of RLaR---a conditional probability distribution function---can be modeled as a uni-modal distribution (with a lower memory footprint) despite evidence that it is multi-modal. Our experiments also show that the learned behavior has some degree of scalability in terms of variations in the swarm size or the environment.}, }
Generated by bib2html.pl (written by Patrick Riley ) on Wed Jun 01, 2022 14:33:17