Session 4: New methods for directed acyclic Gaussian graph and adaptive data analysis – Conference on Statistical Learning and Data Science / Nonparametric Statistics

Session title: New methods for directed acyclic Gaussian graph and adaptive data analysis
Organizer: Yichao Wu (UIC)
Chair: Weibin Mo (UNC)
Time: June 4^th, 9:00-10:30am
Location: VEC 1302/1303

Speech 1: Sublinear-Time Adaptive Data Analysis
Speaker: Lev Reyzin (UIC)
Abstract: The topic of this talk lies in the burgeoning area of adaptive data analysis, where the goal is to design mechanisms that can give statistically valid answers to adaptively generated queries. In this talk, I will discuss sublinear-time mechanisms for answering adaptive queries into datasets. These mechanisms provide a polynomial speed-up per query over previous approaches, without increasing the total amount of data needed. I will also discuss a new method for achieving statistically-meaningful responses even when the mechanism is only allowed to see a constant number of samples from the data per query. Finally, I will show how these techniques also yield improved bounds for adaptively optimizing convex and strongly convex functions over a dataset. This is based on work joint with Benjamin Fish and Benjamin I. P. Rubinstein.

Speech 2: Reconstruction of a directed acyclic Gaussian graph for observational and interventional data
Speaker: Xiaotong Shen (U Minnesota)
Abstract: Directed acyclic graphs are widely used to describe, among interacting units, causal relations. Causal relations are estimated by reconstructing a directed acyclic graph’s structure, presenting a great challenge when the unknown total ordering of a DAG needs to be estimated.In such a situation, it remains unclear if a graph’s structure is reconstructable in the absence of an identifiable likelihood with regard to graphs, and in facing super-exponentially many candidate graphs in the number of nodes. In this talk, I will introduce a global approach to process observational data and interventional data, to identify all estimable causal directions and estimate model parameters. This approach uses constrained maximum likelihood with nonconvex constraints reinforcing the non-loop equirement to yield an estimated directed acyclic graph, where super-exponentially many constraints characterize the major challenge. Computational issues will be discussed in addition to some theoretical aspects. This work is joint with Y. Yuan, W. Pan, Z. Wang and S. Peng.

.Speech 3: Convex clustering over an undirected graph
Speaker: Yunzhang Zhu (OSU)
Abstract: Cluster analysis is a fundamental problem in statistics. It aims at categorizing the observations into different groups, called clusters, such that observations in the same cluster tend to be more similar to each other than those from different clusters. In this talk, I will introduce a new optimization-based clustering method called convex clustering over (weighted) undirected graph. The choice of both the graph and its weights is crucial to clustering performance as well as the algorithm’s computational efficiency. Specifically, we consider two types of graphs: a minimum spanning tree and a so-called K-means bipartite graph. Computationally, both graphs make the associated optimization problems easier to solve compared to that with a complete graph; and statistically, both lead to better clustering results. Further numerical comparisons with the K-means clustering algorithm and a density-based algorithm demonstrate the superior performance of the proposed algorithms.