Frédéric Ferraty, University of Toulouse

"Stepwise algorithm for variable selection in high-dimensional nonparametric regression setting"

(Joint work with Peter Hall)

The high dimensional setting is a modern and dynamic research area in Statistics. It covers numerous situations where the number of variables is much larger than the sample size. This is the case in genomics when one observes (dozens of) thousands genes expression; typically one has at hand a small sample of high dimensioned vectors derived from a large set of covariates. A particular setting may correspond to the observation of a collection of curves, surfaces, ... sampled at high frequencies (measurements). The main feature of this functional data is due to the existence of high colinearities between explanatory variables which reduces the overall dimensionality of the data.

Last twenty years have been devoted to develop successful methodologies able to manage such high dimensional data. Essentially sparse linear modelling involving variable selection techniques has been proposed to investigate on high dimensioned vectors whereas non selective linear approaches have been introduced to handle functional data.

However, as in the standard multivariate setting, linear assumption may be too much restrictive while hiding relevant nonlinear features. This is why in the last decade flexible methodologies taking into account nonlinear relationship have been developed to better understand the structure of such high dimensional data.

This talk presents approaches connecting nonparametric modeling with selective methods in order to handle nonlinear relationship in a large set of covariates with some heuristics oriented towards the functional data setting. Some datasets illustrate the finite sample properties of the proposed methods.