Instructor:

David Pollard

When:

Monday, Friday 11:35  12:50

Where:

24 Hillhouse, main classroom

Office hours:

tba

TA:

Sanghee Cho 
Problem session: 
to be arranged if needed 
Other: 
courses taught by DP in previous years

Short description: 
The geometry of least squares; distribution theory for normal errors; regression, analysis of variance, and designed experiments; numerical algorithms (with particular reference to the R statistical language). Linear algebra and some acquaintance with statistics assumed.

Intended audience:

The course is aimed at students (both graduate and undergraduate)
who have had some introductory exposure to probability (random variables, expected values, variances and covariances, density functions), linear algebra (matrices, orthonormal bases) and possibly some inference.
Students will be expected to learn a little about
the R statistical language.
[These days, every serious statistician has to know something about at least one statistical package. At least in academic statistical circles, R is the de facto standard. And it is free.]

Text:


There is no single textbook for the course;
I have drawn material from many sources.

The online help that comes with R is a good source for learning about the language. The documentation at
CRAN (Comprehensive R Archive Network) includes a gentle Introduction to R together with more detailed manuals.
Many statisticians seem to like the the Venables and Ripley book (see the references), even though it "is not a text in statistical theory, but does cover modern statistical methodology".

See also the handouts for special topics.

Grading:

The final grade will be based entirely on the weekly homework, which is due each WHEN?
Students who wish to work in teams (no more than 2 to a team)
should submit a single solution set. Each member of a team will be
expected to understand the team's solutions sufficiently well to
explain the reasoning at the blackboard. Occasional meetings with DP
will be arranged.

Topics:

Tentative list. The actual material covered will depend, in part, on the backgrounds of students in the class.
 brief introduction to the R language:
 matrices and data frames; lm(); lm.object; summary(); etc
 interpretation of output from R
 fitting by least squares:
 orthonormal bases; QR decompositions
 hat matrices (projections)
 singular value decompositions
 possible singularity of design matrix; near collinearity; stability of the fit
 variances and covariances
 GaussMarkov theorem
 estimation of variance using residual sum of squares
 weighted least squares
 instrumental variables and twostage least squares
 variance minimization; principal components
 ridge regression
 overparametrized models; estimable functions; contrasts
 effect of near collinearity on covariances
 diagnostics and plots
 statistical theory for normal errors:
 multivariate normal and rotation of axes
 chisquare, t, and F distributions
 hypothesis testing
 noncentral chisquare and power of tests
 ANOVA tables
 choice of contrasts for analysis of variance
 random effects and mixed models
 randomization and permutation distributions
 Fisher's justification of normal approximation via randomization
 experimental design: orthogonal and unbalanced designs; blocking;
factorial designs
 causality and the interpretation of fitted models; ecological regression
 analysis of transformations and departures from additivity
 LAD and robust alternatives to least squares
 categorical data and loglinear models
 generalized linear models
