Decentralized MDPs (Dec-MDPs) provide a rigorous framework for collaborative multi-agent sequential decision making under uncertainty and partial observability. However, their high computational complexity limits their practical impact. To overcome this complexity barrier, we focus on a class of DEC-MDPs consisting of independent collaborating agents that are tied together through a global reward function that depend upon their entire histories of states and actions to accomplish joint tasks. We make the following contributions to address the issue of scalability for this class of problems:
- We propose a new actor-critic based Reinforcement Learning (RL) approach for event based Dec-MDPs using successor features (SF) which is a value function representation that decouples the dynamics of the environment from the rewards.
- We then present Dec-ESR : Decentralized Event based Successor Representation, which generalizes learning for event based Dec-MDPs using SF within an end-to-end deep RL framework.
- We also show that the proposed method using SF allows useful transfer of information on related but different tasks, hence bootstraps the learning for new tasks and makes convergence much faster on new tasks.
- For validation purposes, we test our approach on a large multi-agent coverage problem which models schedule coordination of agents in a real urban subway network and achieve better quality solutions in terms of average global reward accumulated by agents compared to previous approaches.
Our inference and RL-based advances enable us to solve a large real-world multiagent coverage problem modeling schedule coordination of agents in a real urban subway network where other approaches fail to scale.