## Fall 2020 iDS^{2} Seminar

NOTE: all seminars for Fall 2020 are held remotely through Zoom. Recordings of seminars are available below. Seminars run on Friday at 2:00pm unless otherwise stated.

December 04, 2020

Shipra Agrawal (Assistant Professor, Columbia University)

**Learning in structured MDPs with convex cost function: improved regret bounds for inventory management**

## Abstract

The stochastic inventory control problem under censored demands is a fundamental problem in revenue and supply chain management. A simple class of policies called “base-stock policies” is known to be optimal for this problem, and further, the convexity of long-run average-cost under such policies has been established. In this work, we present a learning algorithm for the stochastic inventory control problem under lost sales penalty and positive lead times, when the demand distribution is a priori unknown. Our main result is a regret bound of O(L\sqrt{T}+D) for the algorithm, where T is the time horizon, L is the fixed and known lead time, and D is an unknown parameter of the demand distribution described roughly as the number of time steps needed to generate enough demand for depleting one unit of inventory. Our results significantly improve the existing regret bounds for this problem. Notably, even though the state space of the underlying Markov Decision Process (MDP) in this problem is continuous and L-dimensional, our regret bounds depend linearly on L. Our techniques utilize convexity of the long-run average cost and a newly derived bound on `bias’ of base-stock policies, to establish an almost blackbox connection between the problem of learning and optimization in such MDPs and stochastic convex bandit optimization. The techniques presented here may be of independent interest for other settings that involve large structured MDPs but with convex cost functions.

[slides]

November 20, 2020

Daniel Russo (Assistant Professor, Columbia University)

**On the Global Convergence and Approximation Benefits of Policy Gradient Methods**

## Abstract

Policy gradient methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, due to the multi-period nature of the objective, they face non-convex optimization problems. The first part of the talk aims to clarify when policy gradient methods can be expected to converge to a global optimum despite this non-convexity. We show they can get stuck in suboptimal stationary points in extremely simple problems but identify structural properties – shared by several canonical control problems – that guarantee the policy gradient objective function has no suboptimal stationary points despite being non-convex. In the second part of the talk, I’ll zoom in on the special case of state aggregated policies, which induce unavoidable approximation errors as they cannot represent an optimal policy. In this case, I’ll show policy gradient provably converges to better policies than popular approximate policy iteration and value iteration methods.

[slides]

November 13, 2020

Yasaman Bahri (Google Brain)

**A Phase Transition in Gradient Descent for Wide, Deep Neural Networks**

## Abstract

Recent investigations into infinitely-wide deep neural networks have given rise to intriguing connections between deep networks, kernel methods, and Gaussian processes. Backing off of the infinite-width limit, one may wonder to what extent finite-width neural networks will be describable by including perturbative corrections to these results. We identify a regime that appears to be sharply different from such a description. The choice of learning rate in gradient descent is a crucial factor, naturally categorizing the dynamics of deep neural networks into two classes that are separated by a (sharp) phase transition as networks become wider. I will describe the distinct signatures of the two phases, how they are elucidated in a class of solvable simple models, and the implications for neural network performance.

[slides]

November 5, 2020

Zico Kolter (Associate Professor, CMU)

**Equilibrium approaches to deep learning: One (implicit) layer is all you need**

## Abstract

Does deep learning actually need to be deep? In this talk, I will present some of our recent and ongoing work on Deep Equilibrium (DEQ) Models, an approach that demonstrates we can achieve most of the benefits of modern deep learning systems using very shallow models, but ones which are defined implicitly via finding a fixed point of a nonlinear dynamical system. I will show that these methods can achieve results on par with the state of the art in domains spanning large-scale language modeling, image classification, and semantic segmentation, while requiring less memory and simplifying architectures substantially. I will also highlight some recent work analyzing the theoretical properties of these systems, where we show that certain classes of DEQ models are guaranteed to have a unique fixed point, easily-controlled Lipschitz constants, and efficient algorithms for finding the equilibria. I will conclude by discussing ongoing work and future directions for these classes of models.

[slides][video]

October 30, 2020

Maryam Fazel (Professor, UW)

**Learning Linear Dynamical Systems: Improved Rates and the Role of Regularization**

## Abstract

We consider the problem of learning linear dynamical systems from input-output data, or system identification, given limited output samples. Learning the system dynamics is often the basis of associated control or policy decision problems in tasks varying from linear-quadratic control to deep reinforcement learning. Recent literature has provided finite-sample statistical analysis for simple least-squares regression models applied to this problem. When a low-order model is desired, adding a Hankel nuclear norm regularization term to the least squares problem has been helpful in practice (e.g., simplifies model selection, is less sensitive to hyperparameter tuning). In this talk, we discuss a new theoretical analysis and insights for the regularized scheme.

[slides]

October 16, 2020

Mehrdad Moharrami (Postdoc, UIUC)

**Planted Matching Problem**

## Abstract

We study the problem of recovering a planted matching in randomly weighted complete bipartite graphs with n nodes. For some unknown perfect matching M*, the weight of an edge is drawn from one distribution P if the edge belongs to M* and another distribution Q otherwise. Our goal is to infer M*, exactly or approximately, from the edge weights. In this work, we take P=exp(lambda) and Q=exp(1/n), in which case the maximum-likelihood estimator of M* is the minimum-weight matching M. We obtain precise results on the overlap between M* and M, i.e., the fraction of edges they have in common. For lambda >= 4, we have almost perfect recovery, with overlap 1-o(1) with high probability. For lambda < 4, the expected overlap is an explicit function alpha(lambda) < 1: we compute it by generalizing Aldous’ celebrated proof of the zeta(2) conjecture for the un-planted model, using local weak convergence to relate weighted complete bipartite graphs with n nodes to a type of weighted infinite tree, and then deriving a system of differential equations from a message-passing algorithm on this tree.

October 9, 2020

Semih Cayci (Postdoc, UIUC)

**Budget-Constrained Learning and Optimization with Bandit Feedback**

## Abstract

In numerous decision-making processes such as algorithm portfolio design and adaptive routing, each action incurs a random cost and yields a random and potentially correlated reward, where the process continues until the total cost exceeds a resource constraint. Motivated by these applications, we first investigate the general budgeted bandit problem in which the controller aims to maximize the total reward subject to a budget constraint on the total cost. We show that logarithmic regret is achievable for unbounded cost and reward under mild moment conditions. In particular, we propose robust learning algorithms that incorporate linear estimators to extract and exploit the correlation between the cost and reward of an arm for optimal sample complexity. We prove that these algorithms achieve tight regret bounds, which are optimal up to a constant factor in the Gaussian case. In the second part, we focus on the time-constrained bandit problem, and allow the controller to interrupt an ongoing task and forgo its reward for a potentially faster alternative. We show that this controlled interrupt mechanism improves the total reward linearly over time for a broad class of completion time distributions. We propose learning algorithms that learn the optimal arm and interrupt strategy with order-optimal regret. We show that these learning algorithms provide efficient solutions for boosting the time-efficiency of stochastic local search methods in various important applications that require early stopping, such as the *k*-SAT problem.

October 2, 2020

**Zhuoran Yang** (PhD Student, Princeton)

**Provably Efficient Exploration in Reinforcement Learning: An Optimistic Approach**

## Abstract

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation such as deep neural networks must be deployed to approximate either the value function or the policy. The introduction of function approximation raises a fundamental set of challenges involving computational and statistical efficiency, especially under the online setting with active data acquisition. As a result, a core RL question remains open: how can we design provably efficient RL algorithms that incorporate possibly nonlinear function approximation?

In this talk, I will introduce the first generation of efficient value-based and policy-based RL algorithms under the setting where both the value function and policy are represented by powerful function approximators such as the kernel and neural network functions. The proposed algorithms highlight a systematic integration of the “optimism under the face of uncertainty’’ principle into algorithm design and are shown to enjoy both polynomial runtime and polynomial sample complexity. Finally, as an initial attempt to study multi-agent reinforcement learning, I will show how the value-based algorithm can be modified for solving zero-sum stochastic games with efficiency.

[video]

September 11, 2020

**Katherine Tsai** (PhD Student, UIUC)

**A Nonconvex Framework for Structured Dynamic Covariance Recovery**

## Abstract

Flexible, yet interpretable, models for the second-order temporal structure are needed in scientific analyses of high-dimensional data. We develop a structured time-indexed covariance model for dynamic time-series data by factorizing covariances into sparse spatial and temporally smooth components. Traditionally, time-indexed covariance models without structure require a large sample size to be estimable. While the covariance factorization results in both domain interpretability and ease of estimation from the statistical perspective, the resulting optimization problem used to estimate the model components are nonconvex. We design a two-stage optimization scheme with a carefully tailored spectral initialization, combined with iteratively refined alternating projected gradient descent. We prove a linear convergence rate with a nontrivial statistical error for the proposed descent scheme and establish sample complexity guarantees for the estimator. As a motivating example, we consider the neuroscience application of estimation of dynamic brain connectivity. Empirical results using simulated and real brain imaging data illustrate that our approach outperforms existing baselines.

[video]

September 04, 2020

**Necmiye Ozay** (Associate Professor, University of Michigan)

**A fresh look at a classical system identification method**

## Abstract

System identification has a long history with several well-established methods, in particular for learning linear dynamical systems from input/output data. While the asymptotic properties of these methods are well understood as the number of data points goes to infinity or the noise level tends to zero, how well their estimates in finite data regime evolve is relatively less studied. This talk will mainly focus on our analysis of the robustness of the classical Ho-Kalman algorithm and how it translates to non-asymptotic estimation error bounds as a function of number of data samples. If time permits, I will also mention another problem we study at the intersection of learning, control, and optimization: learning constraints from demonstrations as an alternative to inverse optimal control. Our experiments with several robotics problems show (local) optimality can be a very strong prior in learning from demonstrations. I will conclude the talk with some open problems and directions for future research.

[slides]

## Summer 2020 iDS^{2} Seminar

NOTE: all seminars for Fall 2020 are held remotely through Zoom. Recordings of seminars are available below. Seminars run on Friday at 2:00pm unless otherwise stated.

July 24, 2020

**Yuanzhi Li** (Assistant Professor, CMU)

**Backward feature correction: How can deep learning perform deep learning**

## Abstract

How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this deep learning process in terms of hierarchical learning. We refer to hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a sequence of simpler functions, to reduce sample and time complexity. This work formally analyzes how multi-layer neural networks can perform such hierarchical learning efficiently and automatically simply by applying stochastic gradient descent (SGD) to the training objective.

Moreover, we present, to the best of our knowledge, the first theory result indicating how very deep neural networks can be sample and time efficient on certain hierarchical learning tasks, even when no known non-hierarchical algorithms (such as kernel method, linear regression over feature mappings, tensor decomposition, sparse coding, and their simple combinations) are efficient. We establish a new principle called “backward feature correction” to show how the features in the lower-level layers in the network can also be improved via training higher-level layers, which we believe is the key to understand the deep learning process in multi-layer neural networks.

May 01, 2020

**Cong Xie **(PhD student, UIUC)

**Byzantine Tolerance for Distributed SGD**

April 17, 2020

**Ziwei Ji **(PhD student, UIUC)

**Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks**

April 10, 2020

**Philip Amortila **(PhD student, UIUC)

**A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms**

April 03, 2020

**Nan Jiang **(Assistant Professor, UIUC)

**Minimax Methods for Off-policy Evaluation and Optimization**

March 27, 2020

**Shiyu Liang **(PhD student, UIUC)

**The Loss Landscape of Neural Networks**