The goal of Machine Learning Theory is to understand fundamental principles and capabilities of learning from data, as well as designing and analyzing machine learning algorithms. We invite you to the Machine Learning Theory Session of CSL student conference if you are curious about when, how, and why machine learning algorithms work.

The session consists of a keynote speech followed by several student talks in which students present their current research. Besides the theoretical aspects of machine learning, this session covers topics including (but not limited to) statistical inference, algorithms, graphical models, signal processing, etc.

*Time and place – Wednesday Feb. 6, 9am-12pm, CSL B02 *

### Keynote Speaker: Aleksander Madry – MIT

*Machine Learning and Security: The Good, the Bad, and the Hopeful*

*Summary* – Machine learning has made a tremendous progress over the last decade. In fact, many believe now that ML techniques are a “silver bullet”, capable of making progress on any real-world problem they are applied to. But is that really so?In this talk, I will discuss a major challenge in the real-world deployment of ML: making our ML solutions be robust, reliable and secure. In particular, I will survey the widespread vulnerabilities of state-of-the-art ML models to various forms of adversarial noise and then outline promising approaches to alleviating these deficiencies.

*Bio* – Aleksander Madry is the NBX Associate Professor of Computer Science in the MIT EECS Department and a principal investigator in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his PhD from MIT in 2011 and, prior to joining the MIT faculty, he spent some time at Microsoft Research New England and on the faculty of EPFL. Aleksander’s research interests span algorithms, continuous optimization, science of deep learning and understanding machine learning from a robustness perspective. His work has been recognized with a number of awards, including an NSF CAREER Award, an Alfred P. Sloan Research Fellowship, an ACM Doctoral Dissertation Award Honorable Mention, and 2018 Presburger Award.

### Jack Wang – Rice University

*A Max-Affine Spline Perspective of Recurrent Neural Networks*

*Summary* – We develop a framework for understanding and improving recurrent neural networks (RNNs) using max-affine spline operators (MASO). We prove that RNNs using piecewise affine and convex nonlinearities can be written as a simple piecewise affine spline operator. The resulting representation provides several new perspectives for analyzing RNNs, three of which we study. First, we show that an RNN internally partitions the input spaceduring training and that it builds up the partition through time. Second, we show that the affine parameter of an RNN corresponds to an input-specific template, from which we can interpret an RNN as performing a simple template matching (matched filtering) given the input. Third, by closely examining the MASO RNN formula, we prove that injecting Gaussian noise in the initial hidden state in RNNs corresponds to an explicit L2 regularization on the affine parameters, which links to exploding gradient issues and improves generalization. Extensive experiments on several datasets of various modalities demonstrate and validate each of the above analyses. In particular, using initial hidden states elevates simple RNNs to state-of-the-art performance on these datasets.

*Bio* – Jack Wang is a 3rd year Ph.D. student in the Department of Electrical and Computer Engineering at Rice University advised by Richard Baraniuk. His research interests lie in the intersection of machine learning and personalized education.

### Donghwan Lee – UIUC

*Stochastic Primal-Dual Q-Learning*

*Summary* – The problem of learning a map from world observations to actions, called a policy, lies at the core of many sequential decision problems, such as robotics and artificial intelligence(AI). The policy development is often very challenging in many real-world applications as finding accurate world models is difficult under complex interactions between the decision maker and environment. Reinforcement learning (RL) is a subfield of machine learning which addresses the problem of how a decision maker can learn an optimal policy to maximize long-term cumulative rewards(value function), while interacting with unknown environment. RL has recently captured significant attentions in theAI and control community for outperforming human in several challenging tasks, such as Atari video games, AlphaGo (human versus computer competition). In this work, we present a new model-free and off-policy reinforcement learning (RL) algorithm, that is capable of finding a near-optimal policy with state-action observations from arbitrary behavior policies. Our algorithm, called the stochastic primal-dual Q-learning (SPD Q-learning),hinges upon a new linear programming formulation and a dual perspective of the standard Q-learning. In contrast to previous primal-dual RL algorithms, the SPD Q-learning includes a Q-function estimation step, thus allowing to recover an approximate policy from the primal solution as well as the dual solution. We prove a first-of-its-kind result that the SPD Q-learning guarantees a certain convergence rate, even when the state-action distribution is time-varying but sub-linearly converges to a stationary distribution. Numerical experiments are provided to demonstrate the off-policy learning abilities of the proposed algorithm in comparison to the standard Q-learning.

*Bio* – Donghwan Lee isa postdoctoral research associate in the Coordinated Science Laboratory (CSL) at the University of Illinois at Urbana-Champaign. He received his M.S. in Mathematics and Ph.D. in Electrical and Computer Engineering from Purdue University in 2017. He was an Associate Editor of IEEE Transactions on Fuzzy Systems in 2015. His research interests lie broadly in the areas of optimization and control theory. His current research focuses on developing rigorous frameworks for dynamic programming and reinforcement learning that enable efficient operations of stochastic systems with human factors. His most recent research interests are reinforcement learning and its applications to robotics and building control with human interactions.

### Kiran Thekumparampil – UIUC

*Robust training of Conditional GANs with noisy labeled data*

*Summary* – Conditional Generative Adversarial Networks (cGAN) have achieved tremendous success in learning to sample from complex distributions. They can generate fake images which superficially look realistic to the human eye. In this talk, we will look at the problem of learning conditional generators from noisy labeled samples, where the labels are corrupted by random noise. A standard training of conditional GANs will not only produce samples with wrong labels, but also generate poor quality samples. For the case of known noise model, a Robust Conditional GAN (RCGAN) will be introduced along with theoretical guarantees and finite sample analysis. When the noise model is unknown, an extension (RCGAN-U) will be introduced, which learns the noise model simultaneously while training the generator, and empirically performs as good the known noise case.

*Bio* – Kiran Thekumparampil is a 5th year PhD candidate with the department of Electrical and Computer Engineering, advised by Prof. Sewoong Oh. He received the B.Tech. degree in Electrical Engineering from Indian Institute of Technology (IIT) Madras, India in 2014. He is broadly interested in machine learning algorithms and optimization theory. He works on ranking, non-convex optimization and un/semi-supervised problems.

### Purnima Ghale – UIUC

*O(N) algorithms for electronic structure computations*

*Summary* – Extending the size of systems whose electronic structure can be computed is an ongoing effort at various levels of electronic structure theory. Despite computational advances, however, most electronic structure simulations are limited to a few thousand atoms, mainly because accurate computations require the diagonalization of the Hamiltonian matrix, *H*. Unlike most engineering problems, where a few well-chosen modes are generally sufficient, the number of eigenvectors required scales with system size, or the number of electrons, *N _{el}*. In particular, we seek to compute the density matrix,

*P*, a projection matrix constructed out of the

*N*eigenvectors of

_{el}/2*H*.In this talk, I will discuss two approaches that result in O(N) algorithms for computing

*P*, and our recent contributions to these areas. First, I will discuss how localization, graph theory and adaptive parallel computing can be exploited (Ghaleet. al, SIAM J. Sci. Comput., 2017) to obtain linear scaling. While localization-based methods can be useful, they have subtle limitations which inhibit their applicability to generic systems and basis sets, and often are memory-expensive.Thus, I will also discuss a new method that uses approximation theory to obtain

*P*implicitly, which is then sampled via Monte Carlo (Ghaleand Johnson, Comput. Phys. Commun., 2018). In this context, although implicit, general and linearly scaling, the algorithm showcases the necessity for further improvement in stochastic sampling and trace estimators, motivating some of our future work. Finally, I will present our preliminary attempts to solve for a billion atoms on Blue Waters.

*Bio* – Purnima Ghale is a graduate student in Professor Harley T. Johnson’s group in the Department of Mechanical Science and Engineering, and is part of the Center for Exascale Computation of Plasma Assisted Combustion (XPACC) at the University of Illinois at Urbana-Champaign. She is interested in developing computational methods for large scale electronic structure computations, and applying these methods to investigate electron transfer. She holds a Master’s degree in Mechanical Science and Engineering from the University of Illinois at Urbana-Champaign, and an undergraduate degree from Stanford University.

### Sam Spencer – UIUC

*Formation and Evolution of Multilayer Social and Societal Networks Using Agent-Based Models, with Applications to Food, Energy, and Water Systems*

*Summary* – I am studying multilayer networks with two types of layers: awareness layers and active layers. Awareness layers represent avenues by which information is made available to nodes, such as social media links. Active layers represent more concrete bonds that satisfy higher goals, such as buyer-seller relationships.Rather than merely looking at static relationships or centrally designed and coordinated structures, we will consider an agent-based framework. Nodes themselves seek to build (or weight) links in active layers that maximize an objective function representing their perceived utility (subject to constraints). They do so based on information available through their connections in awareness layers. My research is divided into two stages. During the first stage, we are studying various agent-based models –investigating various classes of optimizations (choice of objective functions, information models, etc.)and looking at characteristics of the resulting networks –convergence, degree distribution, distance properties, sensitivity to initial inputs, etc. The second stage seeks to apply the findings of the first to datasets from the NSF FEWSION (Food, Energy, and Water Systems) project. We will look at the relationships between geographic regions, and focus on issues of resource security and supply resilience –in the event of a disruption, which regions are best positioned to reorient themselves to find alternate suppliers? We will seek to identify structural properties of the underlying networks that give rise to that security (on either a node or network level), and match these with behaviors used to form networks (or neighborhoods) with those characteristics. While this work is still in an early state, we will present some examples and illustrations drawn from the work to date, providing a flavor of the results to come.

*Bio* – Sam Spencer received his B.A. degree in Mathematics and Computational and Applied Math (Mathematical Sciences) from Rice University, and his M.S. degree in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign. He is currently pursuing a Ph.D. degree in Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. Between 2006 and 2012, he worked as a Senior Electrical Engineer in the Advanced Technology Center at Rockwell Collins in Cedar Rapids, IA. His other work experience includes the Massachusetts Institute of Technology, ADTRAN, and the US Department of Defense.He has been awarded a National Science Foundation (NSF) Graduate Fellowship, as well as a Computational Science and Engineering Graduate Fellowship, and was a William Lowell Putnam Mathematics Competition honoree.He holds five US patents in areas such as signal detection, feature estimation, and cognitive networked electronic warfare. His research interests include social and societal networks, signal analysis, information security, algorithms, information theory, communications systems, and waveform design.