Session 38: New machine learning methods

Session title: New machine learning methods
Organizer: Annie Qu (UIUC)
Chair: Annie Qu (UIUC)
Time: June 6th, 8:30am – 10:00am
Location: VEC 1302

Speech 1: Generalized self-concordant optimization and its applications in statistical learning
Speaker: Quoc Tran-Dinh (UNC)
Abstract: Many statistics and machine learning applications can be cast into a composite convex minimization problem. Well-known examples include sparse logistic
regression, SVM, and inverse covariance estimation. These problems are well studied and can efficiently be solved by several state-of-the-arts. Recent development in first-order, second-order, and stochastic gradient-type methods has brought a new opportunity to solve many other classes of convex optimization problems in large-scale settings. Unfortunately, so far, such methods require the underlying models to satisfy some structural assumptions such as Lipschitz gradient and restricted strong convexity, which may be failed to hold or may be hard to
check.
In this talk, we demonstrate how to exploit an analytical structure hidden in convex optimization for developing solution methods. Our key idea is to generalize a
powerful concept so-called “self-concordance” introduced by Y. Nesterov and A. Nemirovskii to a broader class of convex functions. We show that this structure
covers many applications in statistics and machine learning. Then, we develop a unified theory for designing numerical methods. We illustrate our theory through
Newton-type and proximal Newton-type methods. We note that the proposed theory can further be applied to develop other methods as long as the underlying
model is involved with a “generalized self-concordant structure”. We provide some numerical examples in different fields to illustrate our theoretical development.

Speech 2: On Scalable Inference with Stochastic Gradient Descent
Speaker: Yixin Fang (New Jersey Institute of Technology)

Abstract: In many applications involving large dataset or online updating, stochastic gradient descent (SGD) provides a scalable way to compute parameter estimates and has gained increasing popularity due to its numerical convenience and memory efficiency. While the asymptotic properties of SGD-based estimators have been established decades ago, statistical inference such as interval estimation remains much unexplored.  The traditional resampling method such as the bootstrap is not computationally feasible since it requires to repeatedly draw independent samples from the entire dataset.  The plug-in method is not applicable when there are no explicit formulas for the covariance matrix of the estimator. In this paper, we propose a scalable inferential procedure for stochastic gradient descent, which, upon the arrival of each observation, updates the SGD estimate as well as a large number of randomly perturbed SGD estimates. The proposed method is easy to implement in practice.  We establish its theoretical properties for a general class of models that includes generalized linear models and quantile regression models as special cases.  The finite-sample performance and numerical utility is evaluated by simulation studies and two real data applications.

Speech 3: Scalable Kernel-based Variable Selection with Sparsistency
Speaker: Junhui Wang (City U. of Hong Kong)
Abstract: Variable selection is central to sparse modeling, and many methods have been proposed under various model assumptions. In this talk, we will present a scalable framework for model-free variable selection in reproducing kernel
Hilbert space (RKHS) without specifying any restrictive model. As opposed to most existing model-free variable selection methods requiring fixed dimension, the proposed method allows dimension p to diverge with sample size n. The proposed method is motivated from the classical hard-threshold variable selection for linear models, but allows for general variable effects. It does not require specification of the underlying model for the response, which is appealing in sparse modeling with a large number of variables. The  proposed method can also be adapted to various scenarios with specific model assumptions, including linear models, quadratic models, as well as additive models. The asymptotic estimation and variable selection consistencies of the proposed method are established in all the scenarios. If time
permits, the extension of the proposed method beyond mean regression will also be discussed.