Session title: Recent advances of high-dimensional statistical learning.
Organizer: Xiaotong Shen (U of Minnesota)
Chair: Xiaotong Shen (U of Minnesota)
Time: June 4th, 11:00pm – 12:30pm
Location: Hammer LL109 A/B
Speech 1: Multiclass Probability Estimation with Support Vector Machines
Speaker: Helen Zhang (Arizona State University)
Abstract: Multiclass classification and probability estimation have important applications in data analytics. We propose a simple and scalable estimation framework for multiclass probabilities based on kernel SVMs. The new estimator does not rely on any parametric assumption on the data distribution, and hence it is flexible and robust. Theoretically, we can show that the proposed estimator is asymptotically consistent. Computationally, the new procedure can be conveniently implemented using standard SVM softwares. Our numerical studies demonstrate competitive performance of the new estimator when compared with existing methods such as multiple logistic regression, linear discrimination analysis (LDA), tree-based methods, and random forest (RF), under various settings.
Speech 2: Minimizing Sum of Truncated Convex Functions and Its Applications
Speaker: Hui Jiang (UMich)
Abstract: We study a class of problems where the sum of truncated convex functions is minimized. In statistical applications, they are commonly encountered when L0-penalized models are fitted and usually lead to NP-Hard non-convex optimization problems. We propose a general algorithm for the global minimizer in low-dimensional settings. We also extend the algorithm to high-dimensional settings, where an approximate solution can be found efficiently. We introduce several applications where the sum of truncated convex functions is used, compare our proposed algorithm with other existing algorithms in simulation studies, and show its utility in edge-preserving image restoration on real data.
Speech 3: Uncertainty and Inference for High-Dimensional Models Using the Solution Paths
Speaker: Peng Wang (University of Cincinnati)
Abstract: Bootstrap based model inference has been well studied. However, such approaches will almost certainly fail for high-dimensional models due to the fact that the selection results are highly sensitive to the choice of tuning parameter and therefore are extremely unstable. We proposed to utilize the information of the entire solution paths to overcome this obstacle. In particular, we select the best model based on the entire solution paths for each bootstrap sample, and make inference about both model and parameters using the results from all the bootstrap samples. Moreover, we also develop tools that visualize and quantify model selection uncertainty. These tools would allow practitioners evaluate the validity of the estimated models in high-dimensional settings.