Statistics 361a/661a, Fall 1996

Data Analysis

Instructor: Joseph Chang
Office: 24 Hillhouse Ave., room 204
Phone: 432-0642
Email: chang@stat.yale.edu

Teaching assistant: Alexandra Thiry

Class times and locations: Mondays, 2:30-3:45, in Becton 102
and Wednesdays, 2:30-3:45, in the Stat Lab, 140 Prospect St.

Prerequisites: The prerequisite in Statistical theory is Stat 242b. As for computing, we will be using the Statistical language S-plus; see below for more information about this.

This class: We will study principles, methods, and examples of data analysis. Although we want to avoid a "cookbook" approach and develop some theoretical understanding of what we are doing, the emphasis is definitely on data analysis in practice more than in theory. Here is a list of topics I'd like to discuss, although I doubt we will cover all of them.

  1. Looking at data: various useful plots, density estimation.
  2. Bootstrap, permutation methods, cross-validation.
  3. Linear models: linear regression, analysis of variance, residuals, influence, model selection.
  4. Generalized linear models: logistic regression, loglinear models, contingency tables.
  5. Maximum likelihood, EM algorithm, mixture models.
  6. Multivariate analysis: principal components, factor analysis, latent variables and causal models.
  7. "Modern regression": projection pursuit, neural networks.
  8. Bayesian calculations, Gibbs sampler.
  9. Time series, spatial data, point processes.
OK, that's a lot. Don't be alarmed.

The plan as of now: The class will meet two times each week. Mondays will be a lecture in Becton 102, giving a conceptual introduction to some new techniques of data analysis. Wednesdays we will meet in the Statlab and try those techniques out on examples and see how they apply to real data.

I hope the class (and particularly the sessions in the Statlab) will be informal and have lots of discussion so that we can learn from each other!

More about S-plus: The statistical computing in this class will be done using S-plus, a powerful language and programming environment for interactive data analysis and graphics. The official line is that a knowledge of S-plus is a prerequisite or corequisite for this course. Luckily, our department offers a laboratory, Stat 200L, each semester that provides a nice, quick, painless introduction to S-plus. The lab meets in the Statlab once a week, Fridays at 2:30--a time that I believe should be convenient for students taking Stat 361/661. The first 4 weeks of the lab are meant to be a "crash course" in S-plus. I strongly recommend students who have not studied S-plus to go to Stat 200L. Of course, you are welcome to take it for credit. If you'd prefer, you are also welcome simply to come along for a while (e.g. 4 weeks) and learn what you need to know about S-plus, without signing up formally for credit. Please see the course description and syllabus for Stat 200L.

Requirements and grading: Grades will be based on homeworks and a project. Homeworks will be assigned every week or two. The project will be due the day the final would have taken place. For the project you can obtain some data on any topic that interests you, analyze it, and write a report. Should be fun!

Homework #1:

Please send me an email message.

You don't have to say anything particular; just put the words "Stat 361" or "Stat 661" in for the subject. One purpose is just so I can make a mailing list. Also, if you have any questions or comments, I'd like to hear them!