profile photo

Tarun Gupta

I'm a DPhil (PhD) student at the University of Oxford under the supervision of Shimon Whiteson, generously funded by the Clarendon Scholarship.

Previously I graduated with a bachelors and masters degree in Computer Science from IIIT Hyderabad, India. Before starting my DPhil, I worked at Singapore Management University (SMU) as a research engineer and as a data scientist at Grab, Singapore.

Email  /  Scholar  /  Twitter  /  Github  /  LinkedIn

  • June 2022: I will join Nvidia as a Research Intern from June 2022!
  • Mar 2021: I will join Google Waymo as a Research Intern from June 2021!
  • Apr 2019: Awarded prestigious Clarendon scholarship for doctoral studies at University of Oxford.
  • Mar 2019: Awarded Dean's Gold Medal for the highest cumulative GPA in the graduating batch (IIIT Hyderabad).
  • Jan 2019: SMU and AAAI travel grant to attend AAAI 2019.
  • Dec 2018: Successfully defended my master's thesis.
  • Dec 2017: Google India, Microsoft India and AAAI travel grant to attend AAAI 2018.

I am currently interested in reinforcement learning, meta learning, transfer learning, and graphical models. Recently, I've been focusing on cooperative multi-agent sequential decision making where a bunch of agents (for eg. autonomous driving vehicles) learn how to behave sensibly and cooperate with the other agents in a decentralized manner.

[NEW] Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?
Tarun Gupta*, Christian Schroeder de Witt*, Denys Makoviichuk, Viktor Makoviychuk, Philip H.S. Torr, Mingfei Sun, Shimon Whiteson
* equal contribution
arXiv, 2020
PDF / Code

In this paper, we demonstrate that, despite its various theoretical shortcomings, Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local value function, can perform just as well as or better than state-of-the-art joint learning approaches on popular multi-agent benchmark suite SMAC with little hyperparameter tuning.

[NEW] UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning
Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Böhmer, Shimon Whiteson
ICML 2021: International Conference on Machine Learning
PDF / Video

We propose universal value exploration (UneVEn) for multi-agent reinforcement learning (MARL) to address the suboptimal approximations of employed monotonic joint-action value function in current SOTA value-based MARL methods on non-monotonic tasks.

[NEW] Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients
Bozhidar Vasilev, Tarun Gupta, Bei Peng, Shimon Whiteson
ALA 2021: Adaptive and Learning Agents (ALA) Workshop at AAMAS 2021
PDF / Video / Slides

We propose semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods and show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.

[NEW] RODE: Learning Roles to Decompose Multi-Agent Tasks
Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, Chongjie Zhang
ICLR 2021: International Conference on Learning Representations
PDF / Video / Code

We propose a scalable role-based multi-agent learning method which effectively discovers roles based on joint action space decomposition according to action effects, establishing a new state of the art on the StarCraft multi-agent benchmark.

Reinforcement Learning for Zone Based Multiagent Pathfinding under Uncertainty
Jiajing Ling, Tarun Gupta, Akshat Kumar
ICAPS 2020: International Conference on Automated Planning and Scheduling
PDF / Video

We address the problem of multiple agents finding their paths from respective sources to destination nodes in a graph (also called MAPF) using difference-of-convex functions (DC) programming which can effectively minimize congestion in zones, while ensuring agents reach their final destinations.

Successor Features Based Multi-Agent RL for Event-Based Decentralized MDPs
Tarun Gupta, Akshat Kumar, Praveen Paruchuri
AAAI 2019: AAAI Conference on Artificial Intelligence
PDF / Code

We propose a new actor-critic based Reinforcement Learning (RL) approach for event-based Dec-MDPs called Decentralized Event based Successor Representation which generalizes learning for event-based Dec-MDPs using SFs within an end-to-end deep RL framework.

Planning and Learning For Decentralized MDPs With Event Driven Rewards
Tarun Gupta, Akshat Kumar, Praveen Paruchuri
AAAI 2018: AAAI Conference on Artificial Intelligence
PDF / Code

We propose a nonlinear programming (NLP) formulation, a probabilistic inference based approach and policy gradient based multiagent reinforcement learning approach to improve the scalability of Decentralized (PO)MDPs where a large number of agents interact through complex joint-rewards that depend on their entire histories of states and actions.

Other Projects
Cloud Orchestration Layer

Built a framework similar to Amazon EC2 console that can coordinate the provisioning of compute and storage resources by negotiating with a set of hypervisors running across physical servers in the datacenter.

AI for Ultimate Tic Tac Toe

Developed an automated AI based player for Ultimate Tic Toe implemented in Python using Greedy Heuristic based Alpha Beta Pruning and optimizing the depth of the search tree.

2D Carrom Game

A 2D game for Carrom using OpenGL.

Web Proxy

A simple web proxy that passes requests and data between a web client and a web server.

Spam Mail Filtering

Analyzing and comparing the performance of various classification algorithms for spam filtering.

Teaching (IIIT Hyderabad)

Multi Agent Systems -- Monsoon’17
Instructor: Prof. Praveen Paruchuri

Optimization Methods -- Spring’17 and Spring’18
Instructor: Prof. Sujit Gujar

Statistical Methods in Artificial Intelligence (Intro. to Machine Learning) -- Monsoon’16
Instructor: Prof. Avinash Sharma

Artificial Intelligence -- Spring’16
Instructor: Prof. Praveen Paruchuri

Structured System Analysis and Design -- Monsoon’15
Instructor: Prof. Raghu Reddy