Durgesh Kalwar

I am a researcher at TCS Research, working under the supervision of Harshad Khadilkar, where I am working on developing a new methodology for directed exploration using General Value Functions in Reinforcement Learning and its application to supply chain management. Lately, I've also been working on Safe RL and adaptive consensus-based distributed training of NN for a static graph topology.

Before joining TCS Research lab, I did my M.Tech. in Machine Learning and Computing from Indian Institute of Space Science and Technology. My thesis was on Safe Exploration in Reinforcement Learning and it mainly focused on safe sequential optimization in switching environments.

I pursued my undergraduate in Electrical Engineering from S. V. National Institute of Technology, India. My thesis was on State Estimation and Control Experiments on Quadrotor, in which I designed and implemented software on an STM32 microcontroller, which incorporated a Kalman filter and enhanced PID control algorithm.

Email  /  CV  /  GitHub  /  Google Scholar  /  LinkedIn  /  Transcript

profile photo

Research

I'm interested in reinforcement learning, machine learning, optimization, distributed learning and robotics.

project image

Guiding Offline Reinforcement Learning using Safety Expert


Richa Verma Durgesh Kalwar, Harshad Khadilkar, Balaraman Ravindran
Accepted in CODS-COMAD conference, 2024
paper /

In this paper, we proposed an approach called GORL (Guided Offline-RL) that leverages a safety expert to guide the offline RL agent in choosing safe actions in uncertain states. This approach can be applied to existing offline RL algorithms and has been shown to outperform current methods in terms of safety without sacrificing performance.

project image

Using General Value Functions to Learn Domain-Backed Inventory Management Policies


Durgesh Kalwar, Omkar Shelke, Harshad Khadilkar
arxiv, 2023
paper /

In this paper, we utilize GVFs to learn domain-backed inventory management policies. GVFs are trained on domain-critical characteristics such as prediction of stock-out probability and wastage quantity. Using this domain expertise for more effective exploration, we train an RL agent to compute the inventory replenishment quantities for a large range of products (up to 6000), which share aggregate constraints such as the total weight/volume per delivery. Additionally, we showed that the GVF predictions can be used to provide additional domain-backed insights into the decisions proposed by the RL agent.

project image

Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning


Durgesh Kalwar, Omkar Shelke, Somjit Nath, Hardik Meisheri, Harshad Khadilkar
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2023
paper / poster /

In this paper, we proposed a General Value functions (GVFs) based exploration strategy to effectively explore in a directed manner in hard exploration tasks.

project image

Safe Sequential Optimization in Switching Environments


Durgesh Kalwar, Vineeth Bala Sukumaran
National Conference on Communications (NCC), 2021
paper /

In this paper, we considered the problem of designing a sequential decision-making agent to maximize an unknown time-varying function that switches with time. The agent is also constrained to make safe decisions with a high probability during the exploration phase. For this safe sequential optimization problem, we proposed an algorithm called Adaptive-SafeOpt, which incorporates Bayesian optimization and change point detection.

project image

Safe Exploration in Reinforcement Learning


Durgesh Kalwar
Master's Thesis, 2021
paper /


Design and source code from Jon Barron's website