Statistics 312/612 (Fall 2016)

Linear models


revised 4 Sept 2016

homework   handouts  references   Syd

Instructor: David Pollard
When: Monday, Wednesday 11:35 - 12:50
Where: WTS A60
Office hours: Thursday 4:00--5:30
TA: Yu Lu
Problem session: to be arranged if needed
Other: courses taught by DP in previous years
Short description: The geometry of least squares; distribution theory for normal errors; regression, analysis of variance, and designed experiments; numerical algorithms (with particular reference to the R statistical language). Linear algebra and some acquaintance with statistics assumed.
Intended audience: The course is aimed at students (both graduate and undergraduate) who have had some introductory exposure to probability (random variables, expected values, variances and covariances, density functions), linear algebra (matrices, orthonormal bases) and possibly some inference.
Students will be expected to learn a little about the R statistical language. [These days, every serious statistician has to know something about at least one statistical package. At least in academic statistical circles, R is the de facto standard for interactive use. And it is free.]
Text:
  • There is no single textbook for the course; I have drawn material from many sources. I will provide online handouts constructed using Rstudio.
  • The online help that comes with R is a good source for learning about the language. The documentation at CRAN (Comprehensive R Archive Network) includes a gentle Introduction to R together with more detailed manuals.
    Many statisticians seem to like the the Venables and Ripley book (see the references), even though it "is not a text in statistical theory, but does cover modern statistical methodology".
  • See also the handouts for special topics.
Expert advice: Advice from an unbiased expert with many years of experience interacting with students.
Grading: In the past I have always based the final grade entirely on the weekly homework. This year I am also considering a final exam, but I am open to creative suggestions of alternatives.

Students who wish to work in teams (no more than 2 to a team) should submit a single solution set, with both members of the team involved in the solution of each problem.

Topics: Tentative list. The actual material covered will depend, in part, on the backgrounds of students in the class. From past experience I know there is not enough time to cover every topic in the detail it deserves.
  • brief introduction to the R language:
    • matrices and data frames; lm(); lm.object; summary(); etc
    • interpretation of output from R
  • fitting by least squares:
    • orthonormal bases; Q-R decompositions
    • hat matrices (projections)
    • singular value decompositions
    • possible singularity of design matrix; near collinearity; stability of the fit
  • variances and covariances
    • Gauss-Markov theorem
    • estimation of variance using residual sum of squares
    • weighted least squares
    • instrumental variables and two-stage least squares
    • variance minimization; principal components
    • ridge regression
    • overparametrized models; estimable functions; contrasts
    • effect of near collinearity on covariances
    • diagnostics and plots
  • statistical theory for normal errors:
    • multivariate normal and rotation of axes
    • chi-square, t, and F distributions
    • hypothesis testing
    • noncentral chi-square and power of tests
    • ANOVA tables
    • choice of contrasts for analysis of variance
  • random effects and mixed models
  • randomization and permutation distributions
    • Fisher's justification of normal approximation via randomization
    • experimental design: orthogonal and unbalanced designs; blocking; factorial designs
  • causality and the interpretation of fitted models; ecological regression
  • analysis of transformations and departures from additivity
  • LAD and robust alternatives to least squares
  • categorical data and log-linear models
  • generalized linear models