[Return to tutorial page]
Statistics 200: Lab 11 (Friday 11 April 1997)
Today's tasks:
Time series. The great New Haven garbage data.
The data set for this week's lab was collected some years ago as part
of a project to predict the amounts of garbage that would have been collected
in New Haven had it not been for the introduction of recycling. The
help file describes in more detail the
form of the data, as it appears in the
Garbage data file. For Splus users, it
might be more convenient to source() the data from the
dump file.
Problem 1
Create time series from the vector of weekly tonnages of garbage. Try both
cts() and rts(). Try plotting your series using both plot() and ts.plot().
You will need to read the help file to figure out the start dates and
frequencies. Compare with a simple plot() of the vector of tons. Do the plots
as time series add anything to your understanding? Use lag.plot() to see the
effects of dependence between nearby weeks.
Problem 2
Use the monthly data for 1986 to "estimate" the missing weekly tonnages.
Create a new series. Plot it. While you are at it, you might try to do
something about the ``spikes'' at the holidays. (Are they real effects of
garbage strewn holidays, or are they just the result of one more day of garbage
collection? You might try to figure out which weeks were only six days of
garbage, followed by a week with 8 days, then replace both by their average.)
Problem 3
Create a new series, showing the tons of garbage collected each month.
Plot it. Which series do you find more informative: weekly or monthly, cts() or
rts()?
Problem 4
Fit a linear model using lm(). Use a model formula to predict tons by a linear
trend in time (What do you use for time?) and a factor variable for month of the
year. Look at the residuals (plotted against what?). Try lag.plot() for the residuals.
Notice anything strange?
You might want to include an indicator function to take out the effect of any
troublesome outlier. For example, if week 7 were obnoxious, you could create
bad7 _ 1:length(tons) == 7
Including bad7 as one of the predictors would then effectively remove week 7
from the series.
If you wish to be fancy you might want to include lagged versions of the series
as predictors. ?lag (Look at the length of the lagged series if you get
complaints from lm().)
Problem 5
Extrapolate the fitted model to 1990, to predict how the garbage production of
New Haven might have continued if the pattern had not been broken by the
introduction of recycling.
Homework Problem
Produce a short report that you might give to Mr. Wolf, showing the history of
garbage collection for the seven years, and showing what you would have
predicted for 1990.