SOLUTIONS TO POLLARD SHEETS 6 and 8

(6.1)

Of the 1747 persons who answered the questionnaire, at least 156
indicated they were black. If the responses were like a random sample
from a large population with 9.09% black, we could treat the number of
blacks in the sample as having a Binomial (n=1747,p=0.0909)
distribution.  The mean of that Binomial distribution equals 158.8,
and the standard deviation is about 12. An observed 156 (or 160, if
you decide to count the multiple responses) is within 1 standard
deviation of the predicted mean.

You could repeat the calculation with n=1726, eliminating the 21
persons who did not answer the race question, but the conclusion would
be about the same.  A departure less than one standard deviation from
a value calculated from a model is not enough to raise doubts about
the model. (Of course there might be other reasons to doubt the model,
but that would be another story.)

---------------------------------------------------------------
(6.2)

I created columns 

	c1 = 'undel'  (number f undeliverables each month)
	c2 = 'received'  (number mailed minus number undeliverable)
	c3 = 'mailed'   (total number mailed each month)
	c4-c6  = first six rows from c1-c3
	c7-c9  = last six rows from c1-c3

Calculate the proportions of undeliverables for each six month period:

MTB > let k1 = sum(c4)/sum(c6)
MTB > let k2 = sum(c7)/sum(c9)
MTB > name k1 'p1hat' k2 'p2hat'

Estimate variances for p1hat, p2hat, then express p2hat - p1hat as a
multiple of its estimated standard deviation:

MTB > let k11 = k1*(1-k1)/sum(c6)
MTB > let k12 = k2*(1-k2)/sum(c9)
MTB > name k11 'var1' k12 'var2'
MTB > let k20 = (k2-k1)/sqrt(var2+var1)

print k1, k2 k11,k12,k20

p1hat    0.129641
p2hat    0.153948
var1     0.000002647
var2     0.000002962
K20    10.263

The observed difference in proportions is more than 10 standard
deviations from zero.  It is implausible that the (average)
undeliverability rates for each 6 month period are the same.

You could also turn the calcualtion around, and give a confidence
interval for the difference in rates.

---------------------------------------------------------------
(8.1)

As with (6.2), I created columns 

	c1 = 'undel'  (number f undeliverables each month)
	c2 = 'received'  (number mailed minus number undeliverable)
	c3 = 'mailed'   (total number mailed each month)

Calculate some marginal totals, and give them names:
MTB > let k1 = sum(c1) 
MTB > let k2 = sum(c2)
MTB > let k3 = k1+k2
MTB > name k1 'undel.tot' k2 'rec.tot' k3 'n'
MTB > print k1-k3

undel.tot    12297.0  	# total number of undeliverable summonses
rec.tot      74312.0  	# total number of  summonses (presumed) received
n            86609.0  	# total number of summonses mailed

Calculate expected and (observed - expected)/sqrt(expected) for each column:

MTB > let c4 =c3*k1/k3  # expected undeliverable
MTB > let c5 = c3*k2/k3 # expected received
MTB > let c6 = (c1-c4)/sqrt(c4)
MTB > let c7 = (c2-c5)/sqrt(c5)
 

MTB > print 'month',c3 c1,c2,c4-c7

 Row  month   mailed   undel  received     exp.u     exp.r  undel.chi   rec.chi

   1  Sep94     6393     775      5618    907.70   5485.30   -4.40444   1.79168
   2  Oct94     6525     776      5749    926.44   5598.56   -4.94255   2.01058
   3  Nov94     7449     917      6532   1057.63   6391.37   -4.32428   1.75907
   4  Dec94     7007     925      6082    994.87   6012.13   -2.21531   0.90116
   5  Jan95     7149     963      6186   1015.04   6133.96   -1.63329   0.66441
   6  Feb95     8110    1171      6939   1151.48   6958.52    0.57520  -0.23398
   7  Mar95    10532    1502      9030   1495.36   9036.64    0.17160  -0.06980
   8  Apr95     6273     951      5322    890.66   5382.34    2.02189  -0.82248
   9  May95     7028    1066      5962    997.86   6030.14    2.15721  -0.87753
  10  Jun95     6802    1130      5672    965.77   5836.23    5.28472  -2.14977
  11  Jul95     6061     991      5070    860.56   5200.44    4.44657  -1.80882
  12  Aug95     7280    1130      6150   1033.64   6246.36    2.99731  -1.21928

Notice the strong pattern in the "chi" values. An assumption of a
fixed undeliverability rate for the whole court year results in a
consistent overestimation of the undeliverables for the earlier part
of the court year, and a consistent underestimation for the later part
of the year. The pattern strongly suggests an increasing rate over
time.

Compare with the Minitab output for the test of noassociation:

MTB > ChiSquare 'undel' 'received'.

Chi-Square Test
Expected counts are printed below observed counts

         undel received    Total
    1      775     5618     6393
        907.70  5485.30
    2      776     5749     6525
        926.44  5598.56
    3      917     6532     7449
       1057.63  6391.37
    4      925     6082     7007
        994.87  6012.13
    5      963     6186     7149
       1015.04  6133.96
    6     1171     6939     8110
       1151.48  6958.52
    7     1502     9030    10532
       1495.36  9036.64
    8      951     5322     6273
        890.66  5382.34
    9     1066     5962     7028
        997.86  6030.14
   10     1130     5672     6802
        965.77  5836.23
   11      991     5070     6061
        860.56  5200.44
   12     1130     6150     7280
       1033.64  6246.36

Total    12297    74312    86609

Chi-Sq = 19.399 +  3.210 +
         24.429 +  4.042 +
         18.699 +  3.094 +
          4.908 +  0.812 +
          2.668 +  0.441 +
          0.331 +  0.055 +
          0.029 +  0.005 +
          4.088 +  0.676 +
          4.654 +  0.770 +
         27.928 +  4.622 +
         19.772 +  3.272 +
          8.984 +  1.487 = 158.375
DF = 11, P-Value = 0.000


Notice that the last table gives just the squares of the "chi"
values. It is a wastew to throw away the information in the signs.

---------------------------------------------------------------
(8.2)

The macro worked pretty much as advertised.  You needed to increase
the 7 repetitions to a value of a few hundred to get the histogram
looking like the theoretical chi-square density.  One nasty little bug
surfaced: if you chose any of the cell probabilities very small, then
you had a reasonable chance (over many repetitions) of getting a zero
count.  That would mess up the comparison with the expected counts
(Minitab doesn't like it if you subtract two columns with different
lengths.)  Drew Carter fixed the bug (his version is also on the I:
drive), at the cost of much slower running times.

Some of you discovered that do loops only work in macros.  You can't
type "do ..." in the summary window.

To draw the theoretical density:

 Fill a column (say c1) with values like 0, 0.1, 0.2, 0.3, 0.4, 0.5, ... (use patterned data).

Use the menu for probability distributions to evaluate the chi-square
density at the points in c1, saving values in c2.  Plot c2 against c1,
with the connect option (use the Symbols pull down menu).