Screenshot from the videoReinforcement Learning (RL) and Trajectory Optimization (TO) are two powerful yet complementary approaches to control and decision-making.
Trajectory Optimization is data-efficient, exploits dynamics derivatives, and naturally handles constraints. However, it can suffer from poor local minima and requires significant online computation.
Reinforcement Learning, on the other hand, is derivative-free, less sensitive to local minima, and enables fast policy execution at deployment. But it is typically data-inefficient and does not explicitly account for constraints.
The central question of this talk is:
How can we combine RL and TO to exploit the strengths of both?
The talk proposes a structured classification of approaches for combining RL and TO.
This is the most powerful class of methods, which have the potential to speed-up RL training, speed-up TO online computation, while also guiding TO towards high-quality solutions. I discuss questions such as:
The final part of this talk presents CACTO, a tightly integrated actor–critic framework with derivative-based TO.
Core ideas:
This creates a virtuous cycle:
Experiments on integrator systems, a car model, and a manipulator show that CACTO significantly improves the quality of TO solutions compared to random or standard RL warm-starts.
Conclusion:
The most effective combination is not simple imitation or post-hoc refinement, but a tight coupling where RL and TO solve the same problem and guide each other toward globally optimal solutions.