Session 44: Recent Avances in Statistical Network, Functional and High-dimensional Data Analysis – Conference on Statistical Learning and Data Science / Nonparametric Statistics

Session title: Recent Avances in Statistical Network, Functional and High-dimensional Data Analysis
Organizer: Ji Zhu (Umich)
Chair: Yujia Deng (UIUC)
Time: June 6^th, 1:15pm – 2:45pm
Location: VEC 1402

Speech 1: Factor Augmented Vector Autoregressive Models under High Dimensional Scaling
Speaker: George Michailidis (U of Florida)
Abstract: Vector Autoregressive Models (VAR) are widely used in applied economics and finance. In this talk, we consider a VAR model augmented with dynamically evolving factors. The time series modeled as a VAR, together with the dynamic factors relate to a large number of other time series hat aid in the identifiability of the model parameters. We investigate the identifiability of such models, as well as estimation and inference issues under high-dimensional scaling. The performance of the proposed methods is assessed through synthetic data and the methodology is illustrated on a economic data set.

Speech 2: Model-assisted design of experiments on networks and social media platforms
Speaker: Edoardo Airoldi (Harvard)
Abstract: Classical approaches to causal inference largely rely on the assumption of “lack of interference”, according to which the outcome of an individual does not depend on the treatment assigned to others, as well as on many other simplifying assumptions, including the absence of strategic behavior. In many applications, however, such as evaluating the effectiveness of healthcare interventions that leverage social structure, assessing the impact of product innovations and ad campaigns on social media platforms, or experimentation at scale in large IT companies, assuming lack of interference and other simplifying assumptions is untenable. Moreover, the effect of interference itself is often an inferential target of interest, rather than a nuisance. In this talk, we will formalize technical issues that arise in estimating causal effects when interference can be attributed to a network among the units of analysis, within the potential outcomes framework. We will introduce and discuss several strategies for experimental design in this context centered around a judicious use statistical models, which we refer to as “model-assisted” design of experiments. In particular, we wish for certain finite-sample properties of the estimator to hold even if the model catastrophically fails, while we would like to gain efficiency if certain aspects of the model are correct. We will then contrast design-based, model-based and model-assisted approaches to experimental design from a decision theoretic perspective.

Speech 3: Correcting Selection Bias via Functional Empirical Bayes
Speaker: Gareth James (USC)
Abstract: Selection bias results from the selection of extreme observations and is a well recognized issue for standard scalar or multivariate data. Numerous approaches have been proposed to address the issue, dating back at least as far as the James-Stein shrinkage estimator. However, the same potential issue arises, albeit with additional complications, for functional data. Given a set of observed functions, one may wish to select for further analysis those which are most extreme according to some metric such as the average, maximum, or minimum value of the function. However, given the functions are often noisy realizations of some underlying mean process, these outliers are likely to generate biased estimates of the quantity of interest. In this talk I propose an Empirical Bayes approach, using Tweedie’s formula, to adjust such functional data to generate approximately unbiased estimates of the true mean functions. The approach has several advantages. It is non-parametric in nature, but is capable of automatically shrinking back towards a James-Stein type estimator in low signal situations. It is also computationally efficient and possesses desirable theoretical properties. Furthermore, I demonstrate through extensive simulations that the approach can produce significant improvements in prediction accuracy relative to possible competitors. It is joint work with Joshua Derenski and Yingying Fan.