Optimization, Control, and Reinforcement Learning

Wednesday, Feburary 24
9:00 – 12:00
Zoom link in registration email

Optimization acts as the engine of the whole modern “data-knowledge-decision” pipeline. Control theory and reinforcement learning could make an impact in the rapidly-changing dynamic environment with sequential decision making. The ubiquity of systems that need automatic control and various successes of reinforcement learning systems further justify the necessity of further developing theories.

The Coordinate Science Laboratory (originally the Control Systems Laboratory) keeps conducting groundbreaking research in related fields. The success is the fruit of the collaborative and interdisciplinary environment of CSL. The Optimization, Control, and Reinforcement Learning session have a keynote speech by Prof. Na Li, a prominent researcher in the area, and an invited talk by Zhuoran Yang, a rising star. We invite all to submit works related (but not restricted) to optimization, control, decision theory, game theory, and reinforcement learning.


Keynote Speaker – Prof. Na Li, Harvard University

Title: Real-time Distributed Decision Making in Networked Systems

Time: 11:00 – 12:00, Feburary 24

Abstract: Uncertainties place a key challenge in many sequential decision makingproblems for dynamical systems, such as energy systems, transportation, building operation, networking, etc. The uncertainties include both unknown system dynamics and volatile external disturbances. In this talk, I will present our recent progress in formally advancing the systematic design of real-time decision making in networked systems, focusing on the challenges raised by uncertainties. We firstly present our recently developed scalable multiagent reinforcement learning algorithms which only use local sensing and communication yet learn nearly-optimal localized policies for the global network. Then we present our online optimal control algorithms with time-varying cost functions and rigorously show how to use prediction effectively to reach a nearly-optimal online performance with fast computation. We will also discuss several extensions raised by real application.

Biography: Na Li is a Gordon McKay professor in Electrical Engineering and Applied Mathematics of the School of Engineering and Applied Sciences at Harvard University. She received her Bachelor degree in Mathematics from Zhejiang University in 2007 and Ph.D. degree in Control and Dynamical systems from California Institute of Technology in 2013. She was a postdoctoral associate of the Laboratory for Information and Decision Systems at Massachusetts Institute of Technology 2013-2014. She has joined Harvard University since 2014. Her research lies in distributed optimization and control of cyber-physical networked systems. She serves as an associate editor for IEEE transactions on automatic control and systems & control letters and has been in the organizing committees for various conferences. She received NSF career award (2016), AFSOR Young Investigator Award (2017), ONR Young Investigator Award(2019), Donald P. Eckman Award (2019), McDonald Mentoring Award (2020), along with some other awards.


Student Keynote Speaker – Zhuoran Yang, Princeton University

Title: Demystifying (Deep) Reinforcement Learning: The Optimist, The Pessimist, and Their Provable Efficiency

Time: 9:00 – 9:40, Feburary 24

Abstract: Coupled with powerful function approximators such as deep neural networks, reinforcement learning (RL) achieves tremendous empirical successes. However, its theoretical understandings lag behind. In particular, it remains unclear how to provably attain the optimal policy with a finite regret or sample complexity. In this talk, we will present the two sides of the same coin, which demonstrates an intriguing duality between optimism and pessimism.

-In the online setting, we aim to learn the optimal policy by actively interacting with an environment. To strike a balance between exploration and exploitation, we propose an optimistic least-squares value iteration algorithm, which achieves a \sqrt{T} regret in the presence of linear, kernel, and neural function approximators.

-In the offline setting, we aim to learn the optimal policy based on a dataset collected a priori. Due to a lack of active interactions with the environment, we suffer from the insufficient coverage of the dataset. To maximally exploit the dataset, we propose a pessimistic least-squares value iteration algorithm, which achieves a minimax-optimal sample complexity.

Biography: Zhuoran Yang is a final-year Ph.D. student in the Department of Operations Research and Financial Engineering at Princeton University, advised by Professor Jianqing Fan and Professor Han Liu. Before attending Princeton, He obtained a Bachelor of Mathematics degree from Tsinghua University. His research interests lie in the interface between machine learning, statistics, and optimization. The primary goal of his research is to design a new generation of machine learning algorithms for large-scale and multi-agent decision-making problems, with both statistical and computational guarantees. Besides, he is also interested in the application of learning-based decision-making algorithms to real-world problems that arise in robotics, personalized medicine, and computational social science.


Student Speakers

Hamza El-Kebir, UofI
Guaranteed Reachability for Systems with Impaired Dynamics
9:40 – 10:00, Feburary 24


Xingyu Bai, UofI
Asymptotic Optimality of Semi-Open-Loop Policies in Markov Decision Processes with Large Lead Times
10:00 – 10:20, Feburary 24


Junchi Yang, UofI
A Catalyst Framework for Minimax Optimization
10:20 – 10:40, Feburary 24


Aaron Havens, UofI
Enforcing Stability Guarantees for Imitation Learning of Linear Control Policies
10:40 – 11:00, Feburary 24