[Return to syllabus page]

Statistics 200: Lab 11 (Friday 11 April 1997)

Today's tasks:
Time series. The great New Haven garbage data.

The data set for this week's lab was collected some years ago as part of a project to predict the amounts of garbage that would have been collected in New Haven had it not been for the introduction of recycling. The help file describes in more detail the form of the data, as it appears in the Garbage data file. For Splus users, it might be more convenient to source() the data from the dump file.

Problem 1

Create time series from the vector of weekly tonnages of garbage. Try both cts() and rts(). Try plotting your series using both plot() and ts.plot(). You will need to read the help file to figure out the start dates and frequencies. Compare with a simple plot() of the vector of tons. Do the plots as time series add anything to your understanding? Use lag.plot() to see the effects of dependence between nearby weeks.

Problem 2

Use the monthly data for 1986 to "estimate" the missing weekly tonnages. Create a new series. Plot it. While you are at it, you might try to do something about the ``spikes'' at the holidays. (Are they real effects of garbage strewn holidays, or are they just the result of one more day of garbage collection? You might try to figure out which weeks were only six days of garbage, followed by a week with 8 days, then replace both by their average.)

Problem 3

Create a new series, showing the tons of garbage collected each month. Plot it. Which series do you find more informative: weekly or monthly, cts() or rts()?

Problem 4

Fit a linear model using lm(). Use a model formula to predict tons by a linear trend in time (What do you use for time?) and a factor variable for month of the year. Look at the residuals (plotted against what?). Try lag.plot() for the residuals. Notice anything strange?

You might want to include an indicator function to take out the effect of any troublesome outlier. For example, if week 7 were obnoxious, you could create

bad7 _  1:length(tons) == 7
Including bad7 as one of the predictors would then effectively remove week 7 from the series.

If you wish to be fancy you might want to include lagged versions of the series as predictors. ?lag (Look at the length of the lagged series if you get complaints from lm().)

Problem 5

Extrapolate the fitted model to 1990, to predict how the garbage production of New Haven might have continued if the pattern had not been broken by the introduction of recycling.

Homework Problem

Produce a short report that you might give to Mr. Wolf, showing the history of garbage collection for the seven years, and showing what you would have predicted for 1990.