Statistics 312/612 (Fall 2010)

Linear models

Instructor: David Pollard
When: Monday, Friday 11:35 - 12:50
Where: 24 Hillhouse, main classroom
Office hours: tba
TA: Sanghee Cho
Problem session: to be arranged if needed
Other: courses taught by DP in previous years
Short description: The geometry of least squares; distribution theory for normal errors; regression, analysis of variance, and designed experiments; numerical algorithms (with particular reference to the R statistical language). Linear algebra and some acquaintance with statistics assumed.
Intended audience: The course is aimed at students (both graduate and undergraduate) who have had some introductory exposure to probability (random variables, expected values, variances and covariances, density functions), linear algebra (matrices, orthonormal bases) and possibly some inference.
Students will be expected to learn a little about the R statistical language. [These days, every serious statistician has to know something about at least one statistical package. At least in academic statistical circles, R is the de facto standard. And it is free.]
Text:
  • There is no single textbook for the course; I have drawn material from many sources.
  • The online help that comes with R is a good source for learning about the language. The documentation at CRAN (Comprehensive R Archive Network) includes a gentle Introduction to R together with more detailed manuals.
    Many statisticians seem to like the the Venables and Ripley book (see the references), even though it "is not a text in statistical theory, but does cover modern statistical methodology".
  • See also the handouts for special topics.
Grading: The final grade will be based entirely on the weekly homework, which is due each WHEN?

Students who wish to work in teams (no more than 2 to a team) should submit a single solution set. Each member of a team will be expected to understand the team's solutions sufficiently well to explain the reasoning at the blackboard. Occasional meetings with DP will be arranged.

Topics: Tentative list. The actual material covered will depend, in part, on the backgrounds of students in the class.
  • brief introduction to the R language:
    • matrices and data frames; lm(); lm.object; summary(); etc
    • interpretation of output from R
  • fitting by least squares:
    • orthonormal bases; Q-R decompositions
    • hat matrices (projections)
    • singular value decompositions
    • possible singularity of design matrix; near collinearity; stability of the fit
  • variances and covariances
    • Gauss-Markov theorem
    • estimation of variance using residual sum of squares
    • weighted least squares
    • instrumental variables and two-stage least squares
    • variance minimization; principal components
    • ridge regression
    • overparametrized models; estimable functions; contrasts
    • effect of near collinearity on covariances
    • diagnostics and plots
  • statistical theory for normal errors:
    • multivariate normal and rotation of axes
    • chi-square, t, and F distributions
    • hypothesis testing
    • noncentral chi-square and power of tests
    • ANOVA tables
    • choice of contrasts for analysis of variance
  • random effects and mixed models
  • randomization and permutation distributions
    • Fisher's justification of normal approximation via randomization
    • experimental design: orthogonal and unbalanced designs; blocking; factorial designs
  • causality and the interpretation of fitted models; ecological regression
  • analysis of transformations and departures from additivity
  • LAD and robust alternatives to least squares
  • categorical data and log-linear models
  • generalized linear models

DBP 25 Aug 2010