Session 37: Recent advances in spectral methods for complex data – Conference on Statistical Learning and Data Science / Nonparametric Statistics

Session title: Recent advances in spectral methods for complex data
Organizer: Yuekai Sun (UMich)
Chair: Edgar Dobriban (Upenn)
Time: June 6^th, 8:30am – 10:00am
Location: VEC 1202/1203

Speech 1: Analyzing Developmental Processes with Optimal Transport
Speaker: Geoff Schiebinger (MIT)
Abstract: Understanding the molecular programs that guide cellular differentiation during development is a major goal of modern biology. Here, we introduce an approach, based on the mathematics of optimal transport, for inferring developmental landscapes, probabilistic cellular fates and dynamic trajectories from large-scale single-cell RNA-seq (scRNA-seq) data collected along a time course. Our approach, Waddington-OT is based on a novel framework for analyzing stochastic processes whose instantaneous temporal couplings agree with optimal transport. We demonstrate the power of WADDINGTON-OT by applying the approach to study 65,781 scRNA-seq profiles collected at 10 time points over 16 days during reprogramming of fibroblasts to iPSCs.

We construct a high-resolution map of reprogramming that rediscovers known features; uncovers new alternative cell fates including neural- and placental-like cells; predicts the origin and fate of any cell class; highlights senescent-like cells that may support reprogramming through paracrine signaling; and implicates regulatory models in particular trajectories. Of these findings, we highlight Obox6, which we experimentally show enhances reprogramming efficiency. Our approach provides a general framework for investigating cellular differentiation.

Speech 2: How to select the number of components in PCA and factor analysis? Understanding and improving permutation methods
Speaker: Edgar Dobriban (Wharton)
Abstract: Selecting the number of components in PCA and factor analysis is a key problem facing practitioners of data science. One of the most popular methods is a permutation approach that randomly scrambles the elements of each feature. It selects the components whose singular values are large compared to the permuted data. This method (also known as parallel analysis) is recommended in many textbooks and review papers, and used in genomics by leading applied statisticians including T Hastie, M Stephens, J Storey, R Tibshirani and WH Wong. However, it is poorly understood. In this talk, we develop a theoretical understanding and propose improvements.

Speech 3: Higher-order spectral graph clustering with motifs
Speaker: Austin Benson (Cornell)
Abstract: Networks are typically described by lower-order connectivity patterns that are captured at the level of individual nodes and edges. However, higher-order connectivity patterns captured by small subgraphs, or network motifs, describe the fundamental structures that control and mediate the behavior of many complex systems. In this talk, I will discuss a higher-order spectral graph clustering framework that finds groups of nodes that participate in many instances of a given motif. I will also show applications of this framework in ecology, biology, transportation, and social networks.