Session 31: Statistical Inference for High-Dimensional Data

Session title: Statistical Inference for High-Dimensional Data
Organizer: Jeff Simonoff (NYU)
Chair: Jeff Simonoff (NYU)
Time: June 5th, 3:15pm – 4:45pm
Location: VEC 1302

Speech 1: Quantile Regression for big data with small memory

Speaker: Xi Chen (NYU)
Abstract:  In this talk, we discuss the inference problem of quantile regression for a large sample size $n$ (and a diverging dimensionality p) but under a limited memory constraint, where the memory can only store a small batch of data of size $m$. A popular approach, the naive divide-and-conquer method, only works when $n=o(m^2)$ and is computationally expensive. This talk proposes a novel inference approach and establishes the asymptotic normality result that achieves the same efficiency as the quantile regression estimator computed on all the data. Essentially, our method can allow arbitrarily large sample size $n$ as compared to the memory size $m$. Our method can also be applied to address the quantile regression under distributed computing environment (e.g., in a large-scale sensor network) or for real-time streaming data.

Speech 2: Inference after cross-validation
Speaker: Joshua Loftus (NYU)
Abstract: We described a method for performing inference on models chosen by cross-validation. When the test error being minimized in cross-validation is a residual sum of squares it can be written as a quadratic form. This allows us to apply the inference framework in Loftus et al. (2016) for models determined by quadratic constraints to the model that minimizes CV test error. Our only requirement on the model training procedure is that its selection events are regions satisfying linear or quadratic constraints. This includes both Lasso and forward stepwise, which serve as our main examples throughout. We do not require knowledge of the error variance. The overall procedure is a computationally intensive method of selecting models adaptively and performing inference for the selected model.

Speech 3: Optimal estimation of Gaussian mixtures via denoised method of moments
Speaker: Yihong Wu (Yale)
Abstract: The Method of Moments is one of the most widely used methods in statistics for parameter estimation, obtained by solving the system of equations that match the population and estimated moments. However, in practice and especially for the important case of mixture models, one frequently needs to contend with the difficulties of non-existence or non-uniqueness of statistically meaningful solutions, as well as the high computational cost of solving large polynomial systems. Moreover, theoretical analysis of method of moments are mainly confined to asymptotic normality style of results established under strong assumptions.