Asymptotics and Non-Asymptotics in Control and Reinforcement Learning

Reinforcement learning is a highly active area of research, blending ideas and techniques from control, optimization, machine learning, and computer science. Given this diversity of viewpoints and frameworks, it is imperative to understand their strengths and their limitations. The aim of this iDS2 virtual mini-workshop is a constructive dialogue and exchange of ideas between researchers in these fields. It will feature two tutorial-style talks emphasizing the asymptotic and the non-asymptotic perspectives, followed by a moderated discussion featuring the speakers.

All events in this mini-workshop will be take place on

Fridays, April 16, April 23, April 30

from 12:30 pm to 3:00 pm CST, with a break from 1:30 pm to 2:00 pm

REGISTER for the mini-workshop

 

April 16, 2021

 

Stochastic Control Problems, and How You Can Solve Yours

Sean Meyn (University of Florida)

 

 

Abstract: Convergence theory for reinforcement learning is sparse: barely existent for Q-learning outside of the special case of Watkins, and the situation is even worse for RL with nonlinear function approximation. This is unfortunate, given the current interest in neural networks.  What’s more, every user of RL knows that it can be insanely slow and unreliable. The talk will begin with explanations for slow convergence based on a combination of statistical reasoning and nonlinear dynamical systems theory. The special sauce in this lecture is an approach to universal stability of RL based on generalizations of Zap Q-learning.
 
Apologies in advance: there will be no finite-n bounds bounds in this lecture — all asymptotic.  We will see why there is little hope for useful finite-n bounds when we consider algorithms with “noise” that has memory (such as in standard Markovian settings).      

 

References:

  1. S. Chen, A. M. Devraj, F. Lu, A. Busic, and S. Meyn. Zap Q-Learning with nonlinear function approximation. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, and arXiv e-prints 1910.05405, volume 33, pages 16879–16890. Curran Associates, Inc., 2020.
  2. A. M. Devraj, A. Busic and S. Meyn. Fundamental design principles for reinforcement learning algorithms. In K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, editors, Handbook on Reinforcement Learning and Control. Springer, 2021.
  3. S. Meyn. Control Systems and Reinforcement Learning. Cambridge University Press, 2021 (current draft available here)

 

April 23, 2021

 

Non-Stochastic Control Theory

Elad Hazan (Princeton University)

 

[additional materials]

 

Abstract: In this talk we will discuss an emerging paradigm in online and adaptive control. We will start by discussing linear dynamical systems that are a continuous subclass of reinforcement learning models widely used in robotics, finance, engineering, and meteorology. Classical control, since the work of Kalman, has focused on dynamics with Gaussian i.i.d. noise, quadratic loss functions and, in terms of provably efficient algorithms, known systems and observed state. We’ll discuss how to apply new machine learning methods which relax all of the above: provably efficient control with adversarial noise, general loss functions, unknown systems and partial observation. We will briefly survey recent work which applies this paradigm to black-box control, time-varying systems and planning in iterative learning control.

No background is required for this talk, but some materials can be found here and here

Based on a series of works with Naman Agarwal, Nataly Brukhim, Karan Singh, Sham Kakade, Max Simchowitz, Cyril Zhang, Paula Gradu, Brian Bullins, Xinyi Chen and Anirudha Majumdar.

 

April 30, 2021

 

Moderated Discussion