Monday, February 24, 2003 Hannes Leeb
Department of Statistics
University of ViennaProperties of Confidence Intervals in Regression After Variable Selection
We analyze the coverage probability of confidence intervals that are constructed after a data-based model selection step. In particular, we consider a `naive' confidence interval that is constructed by ignoring the presence of model selection, i.e., as if the selected model had been given a priori. We study the actual coverage probability of this `naive' interval analytically, and we describe the corresponding minimal coverage probability as a function of observable quantities. For a `naive' interval constructed with a nominal coverage probability of 0.95, say, we find that the actual coverage probability can be well below 0.5. Correcting the `naive' interval by increasing its length so as to guarantee the correct coverage probability, we find that the resulting confidence interval is always larger (in expectation) than the standard interval that is obtained by fitting the
overall model.