Session 25: New development for analyzing biomedical complex data – Conference on Statistical Learning and Data Science / Nonparametric Statistics

Session title: New development for analyzing biomedical complex data
Organizer: Zhezhen Jin (Columbia)
Chair: Peng Wang (University of Cincinnati)
Time: June 5^th, 1:15pm – 2:45pm
Location: VEC 1302

Speech 1: New methods for estimating follow-up rates in cohort studies
Speaker: Xiaonan Xue (Albert Einstein College of Medicine)
Abstract:
Background: The follow-up rate, a standard index of the completeness of follow-up, is important for assessing the validity of a cohort study. A common method for estimating the follow-up rate, the “Percentage Method”, defined as the fraction of all enrollees who developed the event of interest or had complete follow-up, can severely underestimate the degree of follow-up. Alternatively, the median follow-up time does not indicate the completeness of follow-up, and the reverse Kaplan-Meier based method and Clark’s Completeness Index (CCI) also have limitations.
Methods: We propose a new definition for the follow-up rate, the Person-Time Follow-up Rate (PTFR), which is the observed person-time divided by total person-time assuming no dropouts. The PTFR cannot be calculated directly since the event times for dropouts are not observed. Therefore, two estimation methods are proposed: a formal person-time method (FPT) in which the expected total follow-up time is calculated using the event rate estimated from the observed data, and a simplified person-time method (SPT) that avoids estimation of the event rate by assigning full follow-up time to all events. Simulations were conducted to measure the accuracy of each method, and each method was applied to a prostate cancer recurrence study dataset.
Results: Simulation results showed that the FPT has the highest accuracy overall. In most situations, the computationally simpler SPT and CCI methods are only slightly biased. When applied to a retrospective cohort study of cancer recurrence, the FPT, CCI and SPT showed substantially greater 5-year follow-up than the Percentage Method (92%, 92% and 93% vs 68%).
Conclusions: The Person-time methods correct a systematic error in the standard Percentage Method for calculating follow-up rates. The easy to use SPT and CCI methods can be used in tandem to obtain an accurate and tight interval for PTFR. However, the FPT is recommended when event rates and dropout rates are high.

Speech 2: Mediation analysis with time-to-event mediator
Speaker: Mengling Liu (New York University)
Abstract: Mediation analysis is often employed in social and biomedical sciences to facilitate understanding of the effects that an intervention exerts over an outcome either directly or through a mediator. A motivating question is to study the pathway from circulating level of anti-Mullerian hormone (AMH, the exposure) to age at menopause (the mediator) and then to post-menopausal breast cancer (the outcome). However, challenge in the AMH mediation analysis is that the mediator variable, age at menopause, is a time-to-event variable and subject to censoring. Statistical methods to handle censored time-to-event mediator, or incompletely observed mediator in general, are lacking. We propose a series of statistical inference approaches for mediation analysis with a censored time-to-event mediator. Specifically, an estimating equation-based method is proposed for continuous outcomes and incorporates censored mediator through mean residual life (MRL) modeling; and a likelihood-based approach is proposed for categorical outcomes and estimates mediation effects through Monte Carlo methods to handle censored mediator.

Speech 3: Adjustment for covariates in genome-wide association study
Speaker: Tao Wang (Albert Einstein College of Medicine) Abstract: Genome-wide association study (GWAS) has become a popular approach for identifying common genetic variants associated with complex diseases and quantitative traits. For continuous traits, linear regression model is a standard approach widely used in GWAS. Adjustment of covariates is often used to identify the direct effects of genetic variants or to improve statistical power by reducing variability of the trait. However, it is problematic to adjust for heritable covariates. Here, we propose a new method for adjusting covariates by incorporating prior GWAS summary statistics for inferring the direct biological influence on a given trait and improve statistical power. Using simulation studies, the proposed methodology remains a good control of type I error rate under various situations and can achieve high power than a simple linear regression. The method is illustrated by an application to a GWAS analysis.