Yale University
Department of Statistics Seminar

Monday, February 16, 2004

Mark van der Laan
Division of Biostatistics and Statistics
UC Berkeley

Title: Cross-Validated Deletion/Substitution/Addition Algorithms in Learning: Applications in Genomics

Abstract: Suppose that the parameter of interest of the data generating distribution is identified as the minimizer of the expectation of a loss function (i.e., a function of experimental unit and candidate parameter value) over a parameter space, where we allow the loss function itself to be indexed by (unknown) nuisance parameters of the data generating distribution. Given a sieve (sequence of subspaces of the parameter space), based on a parametrization in terms of linear combinations of basis functions, we propose a (leave a proportion out) cross-validated deletion/substitution/addition (D/S/A) algorithm for estimating the parameter of interest. The D/S/A-algorithm is an algorithm aiming to minimize the empirical risk over all allowed linear combinations of k basis functions, and provides an agressive alternative to forward/backward selection (and recursive partitioning) procedures. If we choose the sieve to be discrete, where the amount of discretization is chosen with cross-validation, this algorithm corresponds with a general adaptive epsilon-net estimator, which we proved to be minimax adaptive under specified conditions. This algorithm provides, in particular, black-box algorithms in multivariate regression and hazard estimation for censored outcomes such as survival. In the regression setting, we study the properties of the D/S/A -algorithm in simulations. We apply the method to detect binding sites in yeast gene expression experiments, and regress replication capacity of the HIV-virus on the sequence of the virus in a sample of HIV-infected patients.

Joint work with Sandrine Dudoit, Annette Molinaro, Sandra Sinisi

Seminar to be held in Room 107, 24 Hillhouse at 4:15 pm

Back to Seminars Page