Session 20: OODA: Manifold Data Integration – Conference on Statistical Learning and Data Science / Nonparametric Statistics

Session title: OODA: Manifold Data Integration
Organizer: Marron, James Stephen (UNC)
Chair: Anna Smith (Columbia)
Time: June 5^th, 8:30am – 10:00am
Location: VEC 1302

Speech 1: Random Domain Decomposition for Kriging Riemannian Data
Speaker: Piercesare Secchi (Politecnico di Milano)
Abstract:

Investigation of object data distributed over complex domains gives rise to new challenges for spatial statistics. The linear methods of geostatistics cannot be directly applied when the embedding space for the observed data is non Hilbert. Moreover global assumptions about the stationarity of the random field generating the data are often unsuitable when the spatial domain is large, textured or convoluted, with holes or barriers. We here examine the Random Domain Decomposition computational approach proposed by Menafoglio et al. (2018) applied to the analysis and prediction of tensor data observed over a complex spatial domain. As an illustrative case study, we will analyse the covariances between dissolved oxygen and water temperature in the Chesapeake Bay.

This is joint work with Alessandra Menafoglio (Dept. of Mathematics, Politecnico di Milano) and Davide Pigoli (Dept. of Mathematics, King’s College London)

Speech 2: Nonparametric K-Sample Test on Riemannian Manifolds with Applications to Analyzing Mitochondrial Shapes
Speaker: Ruiyi Zhang (Florida State)
Abstract: This paper develops a nonparametric approach for a k-sample test involving data lying on a Riemannian manifold, such as a shape manifold. The specific problem is to test the hypothesis that a factor (such as the subject, cell, or living conditions) significantly affects mitochondrial morphology as observed in images of skeletal muscles of mice. The fact that a shape space is non-Euclidean and infinite-dimensional rules out standard ANOVA decomposition and requires new ideas. In this work, we extend a metric-based approach, developed for Euclidean spaces previously and termed DISCO analysis, to the several shape representations of planar closed curves. This adaptation leads to a statistic for testing equality of distributions of across groups. We provide the underlying theory for one shape representation, but apply the test to several other shape metrics also. Since the data have a nested structure, we also develop a procedure to test a factor while it includes another significant factor. We analyze and present results for a mitochondria shape dataset, including an interesting result that a change in lifestyle alters shapes of some type of mitochondria.

Speech 3: High-Dimensional Manifold Data Clustering on Symmetric Spaces
Speaker: Chao Huang (UNC)
Abstract: Clustering is one of the fundamental tools in manifold learning, and it has been extensively studied in many applications. However, in many image analysis problems (e.g., directional data analysis, shape analysis), most existing clustering methods established in Euclidian space face several challenges including a symmetric space, a high dimensional feature space, and manifold data variation associated with some covariates. In order to address such challenges, a penalized model-based clustering framework is developed to cluster high dimensional manifold data in symmetric spaces. Specifically, a mixture of geodesic factor analyzers (MGFA) is proposed with mixing proportions de- fined through a logistic model and Riemannian normal distribution in each component for data in symmetric spaces. A geodesic factor analyzer is established to explicitly model the high dimensional features. Penalized likelihood approaches are used to real- ize variable selection procedures. Simulation studies are performed on data generated from unit sphere, and real data analysis are performed on the corpus callosum (CC) shape data from the ADNI study.