Statistics 230/530 , Spring 1996
Introductory Data Analysis

Instructor: J. A. Hartigan
Teaching Assistant: Alexandra Thiry
Class hours: 4-6 Thursday at the Stat Lab, 140 Prospect

This class provides an introduction to statistical techniques in sampling, regression, analysis of variance, time series analysis, categorical data analysis, clustering and classification, and factor analysis. Although no prior experience in statistics is assumed, the course does not provide a comprehensive survey of statistical theory; we present enough theory to interpret the computations. Statistics 123 or its equivalent is required for the course. The SPLUS lab course STAT200 is a corequisite.

The course is intended to show you, by example, how to apply statistical techniques to real data problems. Your work in the course consists of reports due each week describing a statistical analysis of some data set, either your own data or, less desirably, a data set from our library DATALIBE. There will be no exams. To do the analyses, we will use the statistical package SPLUS, for which documentation is available at the Statistics Laboratory.

We have a technique-of-the-week lecture in Dana 107 (opposite the Health Plan) at 2.30-3.45 on Tuesdays, followed by a lab session at the Statlab, 140 Prospect, at 2.30-3.45 Thursdays , applying the technique. The lab session is continued through optional Class Hours 4 - 6 Thursdays , held in the Stat Lab. If you can, plan to spend that time completing your assignment for the week, while there is someone there to help. Reports are due the Tuesday following the lab. There is a penalty for late reports.

Course notes in the form of a Statistical Analysis Workbook, which includes data and analyses necessary for following the lectures and doing the assignments are available at TYCO Copy Center for about $20.

CONTENTS

  1. A single variable
  2. The normal distribution
  3. Summary statistics
  4. Straight line regression
  5. Multivariate regression
  6. Stepwise regression
  7. Analysis of variance, unmatched samples
  8. Factorial analysis of variance
  9. Nesting and unbalanced designs
  10. Categorical data
  11. Log linear models, logistic regression
  12. Sampling
  13. Causal models
  14. Time series
  15. Clustering and classification
  16. Factor analysis
  17. Comics