Session 13: Flexible Statistical Learning and Inference – Conference on Statistical Learning and Data Science / Nonparametric Statistics

Session title: Flexible Statistical Learning and Inference
Organizer: Yufeng Liu (UNC)
Chair: Siliang Gong (UNC)
Time: June 4^th, 1:45pm – 3:15pm
Location: VEC 405

Speech 1: Multi-layered Graphical Models
Speaker: Min Jin Ha (MD Anderson)
Abstract:
Simultaneous modeling of data arising from multiple ordered layers provides insight into the holistic picture of the interactive system and the flow of information. Chain graphs have been used to model the layered architecture of networks where the vertices can be naturally partitioned into ordered layers that exhibit undirected and directed acyclic relations within and between the layers. We develop a multi-layered Gaussian graphical model (mlGGM) to investigate conditional independence structures in probabilistic chain graphs. Our proposed model uses a Bayesian node-wise selection framework that coherently accounts for dependencies in the mlGGM. Using Bayesian variable selection strategies for each of the node-wise regressions allows for flexible modeling, sparsity and incorporation of edge-specific prior knowledge. Through simulated data generated from various scenarios, we demonstrate that our node-wise regression method outperforms other related multivariate regression-based methodologies. We apply mlGGM to identify integrative networks for key signaling pathways in kidney cancer and dynamic signaling networks using longitudinal proteomics data in breast cancer.

Speech 2: Variable Selection for Highly Correlated Predictors
Speaker: Fei Xue (UIUC)
Abstract: Penalty-based variable selection methods are powerful in selecting relevant covariates and estimating coefficients simultaneously. However, variable selection could fail to be consistent when covariates are highly correlated. The partial correlation approach has been adopted to solve the problem with correlated covariates. Nevertheless, the restrictive range of partial correlation is not effective for capturing signal strength for relevant covariates. In this paper, we propose a new Semi-standard PArtial Covariance (SPAC) which is able to reduce correlation effects from other predictors while incorporating the magnitude of coefficients. The proposed SPAC variable selection facilitates choosing covariates which have direct association with the response variable, via utilizing dependency among covariates. We show that the proposed method with the Lasso penalty (SPAC-Lasso) enjoys strong sign consistency in both finite-dimensional and high-dimensional settings under regularity conditions. Simulation studies and the ‘HapMap’ gene data application demonstrate that the proposed method outperforms the traditional Lasso, adaptive Lasso, SCAD, and Peter–Clark-simple (PC-simple) methods for highly correlated predictors.

Speech 3: Support Vector Machine with Confidence
Speaker: Xingye Qiao (SUNY Binghamton)
Abstract: Classification with confidence (Lei, 2014) is a new type of problem in statistical learning. Its goal, in the binary case, is to identify two regions with a specific coverage probability for each class. We propose a support vector classifier to achieve classification with confidence. The classifier has two boundaries. An observation outside of the two boundaries is deemed to be from one of the two classes (with certain confidence), while the region between the boundaries is an ambiguity region which could belong to either class. In the theoretical study, we show a Fisher consistency result and that, with high probability, the resulting classifier can control the non-coverage rates and minimize the ambiguity. Efficient algorithms are developed and numerical studies are conducted to illustrate the effectiveness of the proposed method. This is a joint work with Wenbo Wang.