Session 21: Advances in high-dimensional statistics

Session title: Advances in high-dimensional statistics
Organizer: Genevera Allen (Rice)
Chair: Genevera Allen (Rice)
Time: June 5th, 8:30am – 10:00am
Location: VEC 1303

Speech 1: Adaptive local estimation for high dimensional linear models
Speaker: Yufeng Liu (UNC)

Abstract: High dimensional linear models are commonly used in practice. In many applications, one is interested in linear transformations of regression coefficients such as prediction of the response. One common approach is the global technique which first estimates the coefficients, then plugs the estimator in the linear transformation for prediction. Despite its popularity, regression estimation can be difficult for high dimensional problems. Commonly used assumptions in the literature include that the signal of coefficients is sparse and predictors are weakly correlated. These assumptions, however, may not be easily verified, and can be violated in practice. When the coefficients are not sparse or predictors are strongly correlated, estimation of  coefficients can be very difficult. In this talk, I will present a new adaptive local estimator for linear transformations of the coefficients. This new estimator greatly relaxes the common assumptions for high dimensional problems. Simulation and theoretical results demonstrate the competitive advantages of the proposed method for a wide range of problems.

Speech 2:  Are Clusterings of Multiple Data Views Independent?

Speaker: Jacob Bien (Cornell)
Abstract: It has become increasingly common for scientists to collect more than one type of measurement on a single set of observations.  For instance, a medical researcher might gather both clinical measurements and DNA sequences on a single set of individuals.  Many “multi-view clustering” methods have been developed, which use the information from the different data views to determine a clustering of the observations.  In this talk, however, we explore a more basic question: is the clustering structure from each data view related or independent?  We develop a hypothesis test for investigating this question on the basis of a set of data.  This is joint work with Lucy Gao and Daniela Witten.

Speech 3: Regularized Robust Buckley-James method for AFT Model with General Loss Function
Speaker: Sijian Wang (Rutgers)
Abstract:  In the last decade of genome research, one of the most popular topics is to relate large numbers of gene information to clinical survival phenotypes. As an alternative to the popular Cox’s proportional hazard model, the Accelerated Failure Time (AFT) model specifies an association between survival time and covariates directly. Consequently, it has a simpler and possible more intuitive interpretation than Cox’s model which is based on the hazard function. The Buckley-James method is a popular method to get the estimation for the AFT model. It iterates between the imputation of failure time for censored subjects and the estimation of regression coefficients. In the estimation step, the least square criterion with quadratic loss function is used to get the estimation. It is well known that, for regression with uncensored data, the traditional estimation resulting from the quadratic loss function may not be robust when the variability of response is high and/or there are outliers. In this talk, we proposed a regularized Robust Buckley-James method which can incorporate general loss functions including the absolute loss function, quantile loss function, Huber’s loss function and Tukey’s bisquare loss function. The proposed methods are demonstrated using simulation studies and analysis of a TCGA ovarian cancer dataset.