UnivLogo

Bikramjit Banerjee's Publications

Selected PublicationsAll Sorted by DateAll Classified by Publication Type

I2RL: online inverse reinforcement learning under occlusion

Saurabh Arora, Prashant Doshi, and Bikramjit Banerjee. I2RL: online inverse reinforcement learning under occlusion. Autonomous Agents and Multi-Agent Systems, (2021) 35(4), Springer, 2020.

Download

[PDF] 

Abstract

Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from observing its behavior on a task. It inverts RL which focuses on learning an agent’s behavior on a task based on the reward signals received. IRL is witnessing sustained attention due to promising applications in robotics, computer games, and finance, as well as in other sectors. Methods for IRL have, for the most part, focused on batch settings where the observed agent’s behavioral data has already been collected. However, the related problem of online IRL—where observations are incrementally accrued, yet the real-time demands of the application often prohibit a full rerun of an IRL method—has received significantly less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), which can serve as a common ground for online IRL methods. We demonstrate the usefulness of this framework by casting existing online IRL techniques into this framework. Importantly, we present a new method that advances maximum entropy IRL with hidden variables to the online setting. Our analysis shows that the new method has monotonically improving performance with more demonstration data as well as probabilistically bounded error, both under full and partial observability. Simulated and physical robot experiments in a multi-robot patrolling application situated in varied-sized worlds, which involves learning under high levels of occlusion, show a significantly improved performance of I2RL as compared to both batch IRL and an online imitation learning method.

BibTeX

@Article{Arora20:I2RL,
  author = 	 {Saurabh Arora and Prashant Doshi and Bikramjit Banerjee},
  title = 	 {I2RL: online inverse reinforcement learning under occlusion},


  journal = 	 {Autonomous Agents and Multi-Agent Systems},
  year = 	 {2020},
  volume = 	 {(2021) 35},
  number = 	 {4},
  publisher =    {Springer},
  abstract =     {Inverse reinforcement learning (IRL) is the
  problem of learning the preferences of an agent from observing
  its behavior on a task. It inverts RL which focuses on learning
  an agent’s behavior on a task based on the reward signals
  received. IRL is witnessing sustained attention due to promising
  applications in robotics, computer games, and finance, as well
  as in other sectors. Methods for IRL have, for the most part,
  focused on batch settings where the observed agent’s behavioral
  data has already been collected. However, the related problem of
  online IRL—where observations are incrementally accrued, yet the
  real-time demands of the application often prohibit a full rerun
  of an IRL method—has received significantly less attention. We
  introduce the first formal framework for online IRL, called
  incremental IRL (I2RL), which can serve as a common ground for
  online IRL methods. We demonstrate the usefulness of this
  framework by casting existing online IRL techniques into this
  framework. Importantly, we present a new method that advances
  maximum entropy IRL with hidden variables to the online setting.
  Our analysis shows that the new method has monotonically
  improving performance with more demonstration data as well
  as probabilistically bounded error, both under full and partial
  observability. Simulated and physical robot experiments in a
  multi-robot patrolling application situated in varied-sized worlds,
  which involves learning under high levels of occlusion, show a
  significantly improved performance of I2RL as compared to both
  batch IRL and an online imitation learning method.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Sat May 29, 2021 15:48:22