Give the columns of the data frame more memorable names, such as year, county, race, and so on. Try to use seq() and paste() to construct names like "0--4", "5--9", for the age groups.

The codes 1 through 12 are not very descriptive for the race/ethnicity variable. Create a character vector with entries like "WnHM", "WnHF", and so on as abbreviations. (Hint: With clever use of paste you can build the vector up from a vector with entries like "WnH", "WH", and a vector c("M","F").) Use factor() to create a factor object, with levels "WnH", "WH", ..., to replace the race column of NCI.

Use the information in the list of counties to turn NCI$county into a factor as well, with labels being county name (or CT for the whole of Connecticut)

Look at the attributes() and codes() of NCI$race. How does Splus represent a factor? Try sort(levels(NCI$race))[codes(NCI$race)]. What do you notice?

Can you explain how factors work?

Use apply() for a submatrix of NCI to generate a vector *pops*
of population totals for each row of NCI. Use tapply() with various
factors or lists of factors to create a three-dimensional array called
*nci.array* showing total population in each of the 12
race/ethnicity categories cross-classified by county and year . Use
the aperm() function, if necessary, to ensure that nci.array prints to
the screen as a sequence of matrices (one for each county) with rows
labelled by year and columns by race. For example, one matrix should
look like

, , New Haven WnHM WnHF WHM WHF BM BF AmerM AmerF AsianM AsianF HispM HispF 90 317902 344393 22754 23320 39310 44544 786 845 5455 5290 25344 25962 91 315996 342301 23413 24057 40003 45282 801 856 5714 5567 26142 26856 92 313944 340333 24012 24621 40184 45473 809 872 5897 5851 26845 27511 93 311333 337291 24737 25275 40728 46116 833 901 6110 6175 27699 28309 94 309159 334720 25128 25728 40966 46327 848 903 6285 6413 28176 28859

90 91 92 93 94 CT 7.1 7.2 7.3 7.4 7.4 Fairfield 8.2 8.3 8.3 8.5 8.5 Hartford 8.8 8.9 9.0 9.1 9.2 ... Windham 1.0 1.0 1.0 1.0 1.0Hint: The cross-tabulations generated by tapply can be assigned to objects then manipulated as arrays. Also: ?round

- one variable containing the values from the body of the matrix
- factor variables identifying the original dimnames.

For example, here is the output from a function called matrix.to.factor that I wrote to solve this two-dimensional version of this problem. Hint: Try filling up an array with the factor names, then using factor().

Make sure you can recreate the original matrix (using tapply and the factors) from the output of your function.

For New Haven County, draw graphs for the cumulative proportions of white-nonhispanic, black, and hispanic populations. Put age on the horizontal axis and the fraction of each population younger than each age on the vertical axis. Draw the three curves on the same plot, using different line types for each group. Hint: build your function around matplot(). You might find it easier to create several functions to solve the whole problem.