Session 47: Statistical inference and complex data structures

Session title: Statistical inference and complex data structures
Organizer: Eric Laber (NCST)
Chair: Yubai Yuan (UIUC)
Time: June 6th, 3:15pm – 4:45pm
Location: VEC 1303

Speech 1: Inter-modal Coupling: A Class of Measurements for Studying Local Covariance Patterns Among Multiple Imaging Modalities

Speaker: Kristin Linn (UPenn)
Abstract:Local cortical coupling was recently introduced as a subject-specific measure for studying localized relationships between cortical thickness and sulcal depth. Although a promising first step towards understanding local covariance patterns that are present between these two specific neuro-anatomical measurements, local cortical coupling suffers from a limited scope of imaging modalities that can be analyzed within the framework. We generalize and improve this local coupling measure by proposing an analogue in volumetric space that can be used to produce subject-level feature images among an arbitrary number of volumetric imaging modalities. Our proposed class of measures, collectively referred to as inter-modal coupling (IMCo), is based on a locally weighted regression framework. In this work, we study IMCo between cerebral blood flow and gray matter density using a sample of youths ages 8-21 from the Philadelphia Neurolodevelopmental Cohort. We describe how these two modalities covary spatially throughout the brain find evidence of significant developmental effects in several notable regions.  We also give an overview of other applications where we are applying IMCo to study relationships between multiple types of images.

Speech 2: Modeling Heterogeneity in Motor Learning using Heteroskedastic Functional Principal Components
Speaker: Jeff Goldsmith (Columbia University)
Abstract: We propose a novel method for estimating population-level and subject-specific effects of covariates on the variability of functional data. We extend the functional principal components analysis framework by modeling the variance of principal component scores as a function of covariates and subject-specific random effects. In a setting where principal components are largely invariant across subjects and covariate values, modeling the variance of these scores provides a flexible and interpretable way to explore factors that affect the variability of functional data. Our work is motivated by a novel dataset from an experiment assessing upper extremity motor control, and quantifies the reduction in motion variance associated with skill learning.

Speech 3: Prior Adaptive Semi-supervised Learning with Application to Electronic Health Records Phenotyping
Speaker: Yichi Zhang (Harvard) 
Abstract: Electronic Health Records (EHR) provides large and rich data sources for biomedical researches, and EHR data have been successfully used to gain novel insights into several diseases. However, the usage of EHR data remains quite limited,  because extracting precise phenotype for individual patient requires labor intensive medical chart review and such a manual process is not scalable. To facilitate an automatic procedure for accurate phenotyping, we formulate the problem in a high dimensional setting and propose a semi-supervised method that combine information from chart reviewed records with some data-driven prior knowledge derived from the entire dataset. The proposed estimator, Prior Adaptive Semi-supervised (PASS) estimator, enjoys nice theoretical properties including efficiency and robustness, and applies to a broad class of problems beyond EHR applications. The finite sample performance is evaluated via simulation studies and a real dataset on rheumatoid arthritis phenotyping. Further improvements involving word embedding and selective sampling are discussed.