Give the columns of the data frame more memorable names, such as year, county, race, and so on. Try to use seq() and paste() to construct names like "0--4", "5--9", for the age groups.
The codes 1 through 12 are not very descriptive for the race/ethnicity variable. Create a character vector with entries like "WnHM", "WnHF", and so on as abbreviations. (Hint: With clever use of paste you can build the vector up from a vector with entries like "WnH", "WH", and a vector c("M","F").) Use factor() to create a factor object, with levels "WnH", "WH", ..., to replace the race column of NCI.
Use the information in the list of counties to turn NCI$county into a factor as well, with labels being county name (or CT for the whole of Connecticut)
Look at the attributes() and codes() of NCI$race. How does Splus represent a factor? Try sort(levels(NCI$race))[codes(NCI$race)]. What do you notice?
Can you explain how factors work?
Use apply() for a submatrix of NCI to generate a vector pops of population totals for each row of NCI. Use tapply() with various factors or lists of factors to create a three-dimensional array called nci.array showing total population in each of the 12 race/ethnicity categories cross-classified by county and year . Use the aperm() function, if necessary, to ensure that nci.array prints to the screen as a sequence of matrices (one for each county) with rows labelled by year and columns by race. For example, one matrix should look like
, , New Haven WnHM WnHF WHM WHF BM BF AmerM AmerF AsianM AsianF HispM HispF 90 317902 344393 22754 23320 39310 44544 786 845 5455 5290 25344 25962 91 315996 342301 23413 24057 40003 45282 801 856 5714 5567 26142 26856 92 313944 340333 24012 24621 40184 45473 809 872 5897 5851 26845 27511 93 311333 337291 24737 25275 40728 46116 833 901 6110 6175 27699 28309 94 309159 334720 25128 25728 40966 46327 848 903 6285 6413 28176 28859
90 91 92 93 94 CT 7.1 7.2 7.3 7.4 7.4 Fairfield 8.2 8.3 8.3 8.5 8.5 Hartford 8.8 8.9 9.0 9.1 9.2 ... Windham 1.0 1.0 1.0 1.0 1.0Hint: The cross-tabulations generated by tapply can be assigned to objects then manipulated as arrays. Also: ?round
For example, here is the output from a function called matrix.to.factor that I wrote to solve this two-dimensional version of this problem. Hint: Try filling up an array with the factor names, then using factor().
Make sure you can recreate the original matrix (using tapply and the factors) from the output of your function.