Department of Statistics Seminars | Undergraduate Advising in Statistics

We will be having 2 seminars the week of December 10 – 14 and a seminar on December 17, below are the abstracts for these seminars. Please check the dates & locations as they vary.

Nonparametric Estimation of Spatial Covariance Function and Its Applications, Bo Li (Purdue University)

Speaker Bo Li, Purdue University

Date December 11, 2012

Time 4:00 pm – 4:50 pm

Location 165 Noyes Lab

Sponsor Department of Statistics

Event type Seminar

Abstract:

Covariance structure modeling plays a key role in the spatial data analysis. Various parametric models have been developed to accommodate the idiosyncratic features of a given data set. However, the parametric models may impose unjustified restrictions to the covariance structure and the procedure of choosing a specific model is often ad-hoc. To avoid the choice of parametric forms, we propose a nonparametric covariance estimator for the spatial data, as well as its extension to the spatio-temporal data based on the class of space-time covariance models developed by Gneiting (2002). Our estimator is obtained via a nonparametric approximation of completely monotone functions. It is easy to implement and our simulation shows it outperforms the parametric models when there is no clear information on model specification. Two real data examples are analyzed to illustrate our approach and provide further comparison between the nonparametric estimator and parametric models. Finally, in this talk I will also discuss the potential application of our nonparametric covariance estimator to paleoclimatology.

********************************************************************************************

Feature Screening for Ultrahigh Dimensional Data, Runze Li (Pennsylvania State University)

Speaker Runze Li, Pennsylvania State University

Date December 13, 2012

Time 4:00 pm – 4:50 pm

Location 156 Henry

Sponsor Department of Statistics

Event type Seminar

Abstract:

Ultrahigh dimensional data analysis has become increasingly important in diverse scientific fields. Fan and Lv (2008) proposed sure independence screening (SIS) procedure based on the Pearson correlation and established the sure screening property for the SIS based on Gaussian linear models. In a subsequent works by Fan and colleagues, various model-based feature screening procedures have been developed for pre-specified regression models with ultrahigh dimensional predictors. In this talk, I plan to introduce model free feature screening procedures for ultrahigh dimensional data. By model free, it means that the implementation of such feature screening procedures do not require to pre-specify model structure on regression functions. Thus, the model free feature screening procedures are particularly appealing to ultrahigh-dimensional models, where there are a huge number of candidate predictors but little information about the actual model forms. I will focus on a sure independence screening procedure based on the distance correlation (DC-SIS, for short). The DC-SIS is a model free screening procedure and can be implemented as easily as the SIS. However, the DC-SIS can significantly improve the SIS. In particular, the sure screening property is valid for the DC-SIS under more general settings including linear models. Moreover, the DC-SIS can be used directly to screen grouped predictor variables and for multivariate response variables. We establish the sure screening property for the DC-SIS, and conduct simulations to examine its finite sample performance. Numerical comparison indicates that the DC-SIS performs much better than the SIS in various models. We also illustrate the DC-SIS through a real data example.

********************************************************************************************

Statistical Significance of Clustering for High Dimensional Data, Yufeng Liu (University of North Carolina-Chapel Hill)

Speaker Yufeng Liu, University of North Carolina-Chapel Hill

Date December 17, 2012

Time 4:00 pm – 4:50 pm

Location 1027 Lincoln Hall

Sponsor Department of Statistics

Event type Seminar

Abstract:

Clustering methods provide a powerful tool for the exploratory analysis of high dimensional datasets, such as gene expression microarray data. A fundamental statistical issue in clustering is which clusters are “really there,” as opposed to being artifacts of the natural sampling variation. In this talk, I will present Statistical Significance of Clustering (SigClust) as a cluster evaluation tool. In particular, we define a cluster as data coming from a single Gaussian distribution and formulate the problem of assessing statistical significance of clustering as a testing procedure. Under this hypothesis testing framework, the cornerstone of our SigClust analysis is accurate estimation of those eigenvalues of the covariance matrix of the null multivariate Gaussian distribution. A likelihood based soft thresholding approach is proposed for the estimation of the covariance matrix eigenvalues. Our theoretical work and simulation studies show that our proposed SigClust procedure works remarkably well. Applications to some cancer microarray data examples demonstrate the usefulness of SigClust.

Department of Statistics, University of Illinois at Urbana-Champaign