## Statistics 230/530 , Spring 1996

Introductory Data Analysis

**Instructor: ** J. A. Hartigan

**Teaching Assistant:** Alexandra Thiry

** Class hours: ** 4-6 Thursday at the Stat Lab, 140 Prospect
This class provides an introduction to statistical techniques in
sampling, regression, analysis of variance, time series analysis,
categorical data
analysis, clustering and classification, and factor analysis. Although no prior
experience in statistics is assumed, the course does not provide a comprehensive
survey of statistical theory; we present enough theory to interpret the
computations. Statistics 123 or its equivalent is required for the course. The SPLUS lab course STAT200 is a corequisite.

The course is intended to show you, by example, how to apply
statistical techniques to real data problems. Your
work in the course consists of reports due each week describing a statistical
analysis of some data set, either your own data or, less desirably, a data set
from our library DATALIBE. There will be no exams. To do the analyses, we will
use the statistical package SPLUS, for which documentation is available at the
Statistics Laboratory.

We have a technique-of-the-week lecture in Dana 107 (opposite the Health
Plan) at 2.30-3.45 on Tuesdays, followed by a lab session at the Statlab, 140
Prospect, at 2.30-3.45 Thursdays , applying the technique. The lab session
is continued through optional Class Hours 4 - 6 Thursdays , held
in the Stat Lab.
If you can, plan to spend that time completing your assignment for the week,
while there is someone there to help. Reports are due the Tuesday following
the lab. There is a penalty for late reports.

Course notes in the form of a Statistical Analysis Workbook, which
includes data and analyses necessary
for following the lectures and doing the assignments are available at TYCO
Copy Center for about $20.

#### CONTENTS

- A single variable
- The normal distribution
- Summary statistics
- Straight line regression
- Multivariate regression
- Stepwise regression
- Analysis of variance, unmatched samples
- Factorial analysis of variance
- Nesting and unbalanced designs
- Categorical data
- Log linear models, logistic regression
- Sampling
- Causal models
- Time series
- Clustering and classification
- Factor analysis
- Comics