Session 41: Functional and high dimensional data – Conference on Statistical Learning and Data Science / Nonparametric Statistics

Session title: Functional and high dimensional data
Organizer: Aurore Delaigle (U of Melbourne)
Chair: Aurore Delaigle (U of Melbourne)
Time: June 6^th, 1:15pm – 2:45pm
Location: VEC 1403

Speech 1: Small Sample Confidence Intervals for the ACL (Abduskhurov, Cheng, and Lin) Estimators Under the Proportional Hazards Model
Speaker: Emad Abdurasul (James Madison University)
Abstract: We develop a saddlepoint- based method for generating small sample confidence bands for the population survival function from the ACL estimators, under the proportional hazards model. In the process, we derive the exact distribution of this estimator and develop mid-population tolerance bands for saddlepoint estimators. Our method depends upon the Mellin transform of the zero-truncated survival estimator. This transform is inverted via saddlepoint approximation to yield a highly accurate approximation to the cumulative distribution function of the respective cumulative hazard function. This distribution function is then inverted to produce our saddlepoint onfidence bands. Then we compare our saddlepoint confidence bands with those obtained from competing large sample methods as well as with those obtained from the exact distribution. In our simulation study, we found that the saddlepoint confidence bands are very close to the confidence bands derived from the exact distribution. In addition being close, it is easier to compute, and it outperforms the large sample methods in terms probability convergence.
Speech 2: Binary functional linear models in a stratified sampling setting
Speaker: Sophie Dabo-Niang (Université Lille 3)
Abstract: A functional binary choice model is explored in a stratified sample design context. In other words, a model is considered in which the response is binary, the explanatory variable is functional, and the sample is stratified with respect to the values of the response variable. A dimension reduction of the space of the explanatory random function based on a Karhunen– Loève expansion is used to define a conditional maximum likelihood estimate of the model. Based on this formulation, several asymptotic properties are given. Numerical studies are used to compare the proposed method with the ordinary maximum likelihood method, which ignores
the nature of the sampling. The proposed model yields encouraging results. The potential of the functional sampling model for integrating special non-random features of the sample, which would have been difficult to see otherwise, is also outlined.

Speech 3: Functional CLT and sharp bounds for some (conditional Poisson) survey sampling plans with applications to big (tall) data
Speaker: Patrice Bertail (Université Paris Nanterre)
Abstract: Subsampling methods as well as general sampling methods appear as natural tools to handle very large database (big data in the indivual dimension) when traditionnal statistical methods or statistical learning algorithms fail to be implemented on too large datasets. The choice of the weights of the survey sampling scheme may reduce the loss implied by the choice of a much more smaller sampling size (according to the problem of interest). We will first recall some asymptotic results for general survey sampling based empirical processes indexed by class of functions (see Bertail and Clémençon, 2017, Scandinavian Journal of Statistics), for Poisson type and conditional Poisson (rejective) survey samplings. These results may be extended to a large class of survey sampling plans via the notion of negative association of most survey sampling plans (Bertail, Rebecq, 2018) Then in the perspective to generalize some statistical learning tasks to sampled data, we will obtain exponential bounds for the probabilities of deviation of a Horvitz Thompson sum from its expectation when the variables involved in the summation are obtained by sampling in a finite population according to associated or rejective scheme.