Combining Model-based Control and
Reinforcement Learning

We are interested in exploring the integration of model-based control techniques with reinforcement learning (RL) to enhance the performance and adaptability of autonomous systems. Model-based control provides a structured framework for decision-making based on known system dynamics, while RL offers the ability to learn optimal behaviors through interaction with the environment. By combining these approaches, we aim to leverage the strengths of both methodologies to achieve more robust and efficient control strategies.

Embedding GPU-Parallelized MPC into RL Pipelines

One promising direction is to embed Model Predictive Control (MPC) algorithms directly into RL training pipelines. Because learning is such a data intensive process, it can be computationally prohibitive to evaluate MPC for thousands of agents for every step of training.

To address this challenge, we are developing GPU-parallelized MPC solvers that can efficiently compute control actions for multiple agents simultaneously. By leveraging the parallel processing capabilities of modern GPUs, we can significantly reduce the computational overhead associated with MPC evaluation across large populations of agents. This enables us to integrate MPC into RL frameworks more seamlessly, allowing agents to benefit from model-based control while still learning from their interactions with the environment.

Relevant publications:
[1] Residual MPC: Blending Reinforcement Learning with GPU-Parallelized Model Predictive Control
[2] CusADi: A GPU Parallelization Framework for Symbolic Expressions and Optimal Control

Model-based Observations and Rewards for Learning

To improve learning performance for legged systems, we study how to incorporate physics-based models and biomechanics into the observation and reward structures used in RL. By providing agents with model-based quantities such as the centroidal angular momentum or linear inverted pendulum (LIP) footstep targets as part of their observations, we can guide their learning process towards more stable and efficient locomotion behaviors.

Compared to vanilla end-to-end reward tuning, structuring the observations and rewards based on model-based insights leads to faster convergence and more robust policies. Notably, we see the emergence of natural arm swing and torso dynamics that resemble human locomotion, indicating that the agents are learning to utilize biomechanical principles to enhance their movement efficiency.

Relevant publications:
[1] Learning Humanoid Arm Motion via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning
[2] Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion