SOLUTIONS TO POLLARD SHEETS 6 and 8 (6.1) Of the 1747 persons who answered the questionnaire, at least 156 indicated they were black. If the responses were like a random sample from a large population with 9.09% black, we could treat the number of blacks in the sample as having a Binomial (n=1747,p=0.0909) distribution. The mean of that Binomial distribution equals 158.8, and the standard deviation is about 12. An observed 156 (or 160, if you decide to count the multiple responses) is within 1 standard deviation of the predicted mean. You could repeat the calculation with n=1726, eliminating the 21 persons who did not answer the race question, but the conclusion would be about the same. A departure less than one standard deviation from a value calculated from a model is not enough to raise doubts about the model. (Of course there might be other reasons to doubt the model, but that would be another story.) --------------------------------------------------------------- (6.2) I created columns c1 = 'undel' (number f undeliverables each month) c2 = 'received' (number mailed minus number undeliverable) c3 = 'mailed' (total number mailed each month) c4-c6 = first six rows from c1-c3 c7-c9 = last six rows from c1-c3 Calculate the proportions of undeliverables for each six month period: MTB > let k1 = sum(c4)/sum(c6) MTB > let k2 = sum(c7)/sum(c9) MTB > name k1 'p1hat' k2 'p2hat' Estimate variances for p1hat, p2hat, then express p2hat - p1hat as a multiple of its estimated standard deviation: MTB > let k11 = k1*(1-k1)/sum(c6) MTB > let k12 = k2*(1-k2)/sum(c9) MTB > name k11 'var1' k12 'var2' MTB > let k20 = (k2-k1)/sqrt(var2+var1) print k1, k2 k11,k12,k20 p1hat 0.129641 p2hat 0.153948 var1 0.000002647 var2 0.000002962 K20 10.263 The observed difference in proportions is more than 10 standard deviations from zero. It is implausible that the (average) undeliverability rates for each 6 month period are the same. You could also turn the calcualtion around, and give a confidence interval for the difference in rates. --------------------------------------------------------------- (8.1) As with (6.2), I created columns c1 = 'undel' (number f undeliverables each month) c2 = 'received' (number mailed minus number undeliverable) c3 = 'mailed' (total number mailed each month) Calculate some marginal totals, and give them names: MTB > let k1 = sum(c1) MTB > let k2 = sum(c2) MTB > let k3 = k1+k2 MTB > name k1 'undel.tot' k2 'rec.tot' k3 'n' MTB > print k1-k3 undel.tot 12297.0 # total number of undeliverable summonses rec.tot 74312.0 # total number of summonses (presumed) received n 86609.0 # total number of summonses mailed Calculate expected and (observed - expected)/sqrt(expected) for each column: MTB > let c4 =c3*k1/k3 # expected undeliverable MTB > let c5 = c3*k2/k3 # expected received MTB > let c6 = (c1-c4)/sqrt(c4) MTB > let c7 = (c2-c5)/sqrt(c5) MTB > print 'month',c3 c1,c2,c4-c7 Row month mailed undel received exp.u exp.r undel.chi rec.chi 1 Sep94 6393 775 5618 907.70 5485.30 -4.40444 1.79168 2 Oct94 6525 776 5749 926.44 5598.56 -4.94255 2.01058 3 Nov94 7449 917 6532 1057.63 6391.37 -4.32428 1.75907 4 Dec94 7007 925 6082 994.87 6012.13 -2.21531 0.90116 5 Jan95 7149 963 6186 1015.04 6133.96 -1.63329 0.66441 6 Feb95 8110 1171 6939 1151.48 6958.52 0.57520 -0.23398 7 Mar95 10532 1502 9030 1495.36 9036.64 0.17160 -0.06980 8 Apr95 6273 951 5322 890.66 5382.34 2.02189 -0.82248 9 May95 7028 1066 5962 997.86 6030.14 2.15721 -0.87753 10 Jun95 6802 1130 5672 965.77 5836.23 5.28472 -2.14977 11 Jul95 6061 991 5070 860.56 5200.44 4.44657 -1.80882 12 Aug95 7280 1130 6150 1033.64 6246.36 2.99731 -1.21928 Notice the strong pattern in the "chi" values. An assumption of a fixed undeliverability rate for the whole court year results in a consistent overestimation of the undeliverables for the earlier part of the court year, and a consistent underestimation for the later part of the year. The pattern strongly suggests an increasing rate over time. Compare with the Minitab output for the test of noassociation: MTB > ChiSquare 'undel' 'received'. Chi-Square Test Expected counts are printed below observed counts undel received Total 1 775 5618 6393 907.70 5485.30 2 776 5749 6525 926.44 5598.56 3 917 6532 7449 1057.63 6391.37 4 925 6082 7007 994.87 6012.13 5 963 6186 7149 1015.04 6133.96 6 1171 6939 8110 1151.48 6958.52 7 1502 9030 10532 1495.36 9036.64 8 951 5322 6273 890.66 5382.34 9 1066 5962 7028 997.86 6030.14 10 1130 5672 6802 965.77 5836.23 11 991 5070 6061 860.56 5200.44 12 1130 6150 7280 1033.64 6246.36 Total 12297 74312 86609 Chi-Sq = 19.399 + 3.210 + 24.429 + 4.042 + 18.699 + 3.094 + 4.908 + 0.812 + 2.668 + 0.441 + 0.331 + 0.055 + 0.029 + 0.005 + 4.088 + 0.676 + 4.654 + 0.770 + 27.928 + 4.622 + 19.772 + 3.272 + 8.984 + 1.487 = 158.375 DF = 11, P-Value = 0.000 Notice that the last table gives just the squares of the "chi" values. It is a wastew to throw away the information in the signs. --------------------------------------------------------------- (8.2) The macro worked pretty much as advertised. You needed to increase the 7 repetitions to a value of a few hundred to get the histogram looking like the theoretical chi-square density. One nasty little bug surfaced: if you chose any of the cell probabilities very small, then you had a reasonable chance (over many repetitions) of getting a zero count. That would mess up the comparison with the expected counts (Minitab doesn't like it if you subtract two columns with different lengths.) Drew Carter fixed the bug (his version is also on the I: drive), at the cost of much slower running times. Some of you discovered that do loops only work in macros. You can't type "do ..." in the summary window. To draw the theoretical density: Fill a column (say c1) with values like 0, 0.1, 0.2, 0.3, 0.4, 0.5, ... (use patterned data). Use the menu for probability distributions to evaluate the chi-square density at the points in c1, saving values in c2. Plot c2 against c1, with the connect option (use the Symbols pull down menu).