Statistics 312/612 (Fall 2016)
revised 4 Sept 2016
Monday, Wednesday 11:35 - 12:50
||to be arranged if needed
courses taught by DP in previous years
The geometry of least squares; distribution theory for normal errors; regression, analysis of variance, and designed experiments; numerical algorithms (with particular reference to the R statistical language). Linear algebra and some acquaintance with statistics assumed.
The course is aimed at students (both graduate and undergraduate)
who have had some introductory exposure to probability (random variables, expected values, variances and covariances, density functions), linear algebra (matrices, orthonormal bases) and possibly some inference.
Students will be expected to learn a little about
the R statistical language.
[These days, every serious statistician has to know something about
at least one statistical package. At least in academic statistical circles,
R is the de facto standard for interactive use. And it is free.]
There is no single textbook for the course;
I have drawn material from many sources.
I will provide online handouts constructed using
The online help that comes with R is a good source for learning about the language. The documentation at
CRAN (Comprehensive R Archive Network) includes a gentle Introduction to R together with more detailed manuals.
Many statisticians seem to like the the Venables and Ripley book (see the references), even though it "is not a text in statistical theory, but does cover modern statistical methodology".
See also the handouts for special topics.
Advice from an unbiased expert with many years of experience interacting with students.
In the past I have always
based the final grade entirely on the weekly homework.
This year I am also considering a final exam, but I am open to
creative suggestions of alternatives.
Students who wish to work in teams (no more than 2 to a team)
should submit a single solution set, with both members of the team involved
in the solution of each problem.
|| Tentative list. The actual material covered will depend, in part,
on the backgrounds of students in the class. From past experience I know
there is not enough time to cover every topic in the detail it deserves.
- brief introduction to the R language:
- matrices and data frames; lm(); lm.object; summary(); etc
- interpretation of output from R
- fitting by least squares:
- orthonormal bases; Q-R decompositions
- hat matrices (projections)
- singular value decompositions
- possible singularity of design matrix; near collinearity; stability of the fit
- variances and covariances
- Gauss-Markov theorem
- estimation of variance using residual sum of squares
- weighted least squares
- instrumental variables and two-stage least squares
- variance minimization; principal components
- ridge regression
- overparametrized models; estimable functions; contrasts
- effect of near collinearity on covariances
- diagnostics and plots
- statistical theory for normal errors:
- multivariate normal and rotation of axes
- chi-square, t, and F distributions
- hypothesis testing
- noncentral chi-square and power of tests
- ANOVA tables
- choice of contrasts for analysis of variance
- random effects and mixed models
- randomization and permutation distributions
- Fisher's justification of normal approximation via randomization
- experimental design: orthogonal and unbalanced designs; blocking;
- causality and the interpretation of fitted models; ecological regression
- analysis of transformations and departures from additivity
- LAD and robust alternatives to least squares
- categorical data and log-linear models
- generalized linear models