Session 46: New developments for large complex data – Conference on Statistical Learning and Data Science / Nonparametric Statistics

Session title: New developments for large complex data
Organizer: Annie Qu (UIUC)
Chair: Annie Qu (UIUC)
Time: June 6^th, 3:15pm – 4:45pm
Location: VEC 902

Speech 1: Point and Interval Estimations for Individualized MCID
Speaker: Jiwei Zhao (SUNY, Buffalo)
Abstract: The minimal clinically important difference (MCID) is the smallest change in a treatment outcome that an individual patient would identify as important. In the era of precision medicine, it is of particular interest to study both point and interval estimations for the individualized MCID. The motivating example of this work is the ChAMP trial, which is a randomized controlled trial to compare debridement to observation of chondral lesions encountered during partial meniscectomy. In this trial, the primary outcome is the patient reported pain score one year after the surgery and we are interested in estimating the individualized MCID so that the treatment effect can be further studied. In this paper, we formulize this problem in a classification setting where nonconvex minimization technique is needed for the optimization. Furthermore, we develop the Bahadur representation of the individualized MCID so that its confidence interval can be derived. The proposed method is illustrated via comprehensive simulation studies. We also apply our proposed methodology to the ChAMP trial analysis.

Speech 2: Robust Probabilistic Classification for Irregularly Sampled Functional Data
Speaker: Doug Simpson (UIUC)
Abstract:Motivated by research on diagnostic ultrasound to evaluate tissue regions of interest, we present a robust probabilistic classifier for functional data that predicts the membership for given input and provides reliable probability estimates for class memberships. This method combines Bayes classifier and semi-parametric mixed effects model with robust tuning parameter. We aim to make the method robust to outlying curves especially in providing a robust estimate of certainty in prediction, which is crucial in medical diagnosis. This approach is applicable to various structures, such as samples observed over varying intervals or repeatedly measured curves retaining between-curve correlation, with no parametric assumption on within curve covariance. We conduct simulation studies to investigate the operating characteristics of the probability estimates in the presence of ideal data and data with outlying curves and compare with other functional classification procedures. We illustrate the methodology in classification of quantitative ultrasound data and other applications.

Speech 3: A new method for constructing gene co-expression networks based on samples with tumor purity heterogeneity
Speaker: Francesca Petralia (Mount Sinai)
Abstract: Tumor tissue samples often contain an unknown fraction of stromal cells. This problem well known as tumor purity heterogeneity (TPH) was recently recognized as a severe issue in omics studies. Specifically, if TPH is ignored when inferring co-expression networks, edges are likely to be estimated among genes with mean shift between non-tumor and tumor cells rather than among gene pairs interacting with each other in tumor cells. To address this issue, we propose TSNet a new method which constructs tumor-cell specific gene/protein co-expression networks based on gene/protein expression profiles of tumor tissues. TSNet treats the observed expression profile as a mixture of expressions from different cell types and explicitly models tumor purity percentage in each tumor sample. Using extensive synthetic data experiments, we demonstrate that TSNet outperforms a standard graphical model not accounting for tumor-purity heterogeneity. We then apply TSNet to estimate tumor specific gene co-expression networks based on TCGA ovarian cancer RNAseq data. We identify novel co-expression modules and hub structure specific to tumor cells.