Dear xxxxxxx Some time ago you sent me a letter describing a problem in the interpretation of a task force decision. I found your questions so interesting that I took the liberty of setting my statistics class the problem of writing a helpful response. (We were studying questions of experimental design and sampling at the time.) It is not easy to sort through the issues that you raise. I did remove information that would identify you before I gave the letter to the class, so you don't have to worry about about any hate mail for setting such a tricky practical problem. Let me give you the bad news first. I don't believe you can salvage anything by means of probabilistic arguments. The logic behind the probabilistic reasoning requires a random sampling situation, or at least a rough approximation to random sampling, or some form of statistical experimental design (for which randomization is a key ingredient). As you pointed out, the task force was in no way a random sample from a wider constituency (line 14: judgmentally selected; line 32: sample was not randomly selected). Your procedure was not at all like sending out a questionnaire to a randomly selected subset of the membership. I suspect you have been misled into looking for a statistical analysis by the similarity between certain technical terms (such as `significance' or 'bias') and their everyday meanings. For example, on lines 41--43: "this must be significant and useful to our process. If I remember my statistics correctly, in order to say that something is significant and useful ..." I too would be inclined to regard the near consensus of a task force, in which I had confidence, as significant--but not in the statistical sense. The law of large numbers (lines 37,38) also relies on consequences of random sampling for its validity. It asserts that the average (mean) of a large number of independent values, each sampled independently from a fixed population, will have high probability of lying close to a corresponding population average. Note that it is the size of the sample, and not the size of the population, that leads to the desirable consequence for the sample average. But, once again, the concept is not relevant to your problem, because you are not in a random sampling situation. Confounded factors and control (lines 43--45) are important ideas in experimental design, or in the interpretation of randomly generated data. For example, if someone marketing a study guide were to report an experiment showing that the students using the guide had scores 10% higher than those who didn't use the guide, I would want to know more about the experiment before parting with any money. If it turned out that all the students using the guide were several years older than the students not using the guide, I would suspect that the observed difference in performance might be confounded with an age effect, and not just a benefit of the guide. You described some of your precautions to avoid domination of the discussion by a single personality. You concluded that (line 54) "I believe it would be improbable that such unanimity would be attributable to chance," Again, you seemed to be drawing an inappropriate parallel with the random sampling situation, where extreme outcomes are possible, but only with small probability. In a situation where issues are discussed and opinions defended, the final opinion of a group is not at all analogous to flipping a bunch of pennies and counting up the number of heads. Even if you had chosen your task force at random from the constituency, subsequent discussion and interaction between members of the task force would nullify the random sampling interpretation of the final conclusions. Consider an analogy. Suppose you wished to determine what fraction of a large population is well disposed towards a particular brand of orange juice. If you take a random sample and obtain a response from each person in the sample, you might be able to draw conclusions about the larger population. [I say `might' because it is notoriously difficult to pose questions that do not influence a response. For example, "Do you think it is important for children to get an adequate supply of vitamin C, from sources such as our orange juice?" or "If you had a choice between water and our product, which would you choose?" or "Do you like our product?" might elict different responses. I am no expert at such aspects of sampling, but I have been burned enough to be wary.] If you placed the members of your random sample in a room and allowed them to discuss the issues before casting votes, would you expect no effect on the initial opinions? If not, why even bother to have discussions? On line 66 ("better than a 50/50 chance") you seem to be drawing an analogy between good faith efforts to make fair choices and the notion of a fair coin--or should I say a more-than-fair coin? Such analogies cannot retrofit the randomness required for the sampling interpretation. Indeed, suppose you had gone to the trouble of having the various groups elect their representatives for the task force. Even then could you be sure that the discussions had not shifted the representatives away from a perceived duty to represent the group opinion? I am reminded of an analogy with the selection of delegates to a party's convention. I would not want to assert that the final outcomes would be the same as if a random sample of a larger population were taken. Representativeness can have different meanings. How would you respond to a cynic who asserted that you had subconsciously chosen task force members who tended to agree with your own position? Or that the choice of persuasive group leaders had an important effect on the discussion? Or that by feeding your task force well you made them better disposed to ideas that they perceived as being important to your organization? In sampling and experimental design, randomization is the defence against a charge of unconscious bias in selection of the sample or the allocation of treatments. In summary, you cannot appeal to blind chance for support of your judgement. I do not want to leave you with the impression that your task force was automatically a waste of time. Indeed, random sampling is not the only way to learn what people think. It is not the way to reach a consensus of opinion. Much of the policy of my own university is determined by faculty committees, which I hope are not just random samples of faculty members thrown together around a conference table. We do not select US senators as random samples of size 2 from the population of states, or supreme courts as random samples of size 9 from all judges. I do not select a random textbook when I want to read about a particular topic. We all try to exercise judgement. Of course, if someone disagrees with our judgements then we have to use powers of persuasion, rather than appeals to probability theory, to win support. Sincerely, David Pollard PS. Several of my students suggested that you might find the text book for the course ("Introduction to the Practice of Statistics", by David Mooore and George McCabe, Freeman and Co, New York,1998) useful if you want a refresher on statistics. The book is quite readable, with many real examples. 1 Dear mavin of math, 2 3 I have a question regarding statistics which I am hoping someone in 4 your department could help me answer. It has been some time since I 5 had statistics in college and to say that I am rusty is an 6 understatement. Before I ask the question you will need some 7 background information. 8 9 I work for an organization that issues standards for use by public 10 entities. We try to obtain input and feedback from our constituents 11 to ensure the standards meet their needs. We recently put together a 12 task force of 30 people selected from among our constituents to help 13 us evaluate a proposed standard. The 30 members of the task force 14 were judgmentally selected by my organization. The emphasis in 15 selecting these task force members was on obtaining views from the 16 many organizations and groups that constitute our constituency. Thus 17 a representative was selected from each organization and group. We 18 selected multiple representatives from the larger of the organizations 19 and groups. 20 21 One member of our staff asked what use the task force would be in 22 helping us choose from among the many directions we could go with the 23 proposed standard. I responded as follows, "If 28 out of the 30 24 members agree on one option, that would be indicative to me that there 25 may be a similar trend in the population." If there is a similar 26 trend in the population we could use the task force input to choose an 27 option that would best meet the needs of our constituency. 28 29 In making the above statement I am not saying that we could 30 extrapolate the results to the population to say that 28 out of 30 31 members of the population would also choose the same option. The 32 sample was not randomly selected. Not all organizations and groups 33 that constitute our constituency are equally represented on the task 34 force based on membership. Further, although we know the size of the 35 membership of some of the organizations and groups that constitute our 36 constituency, we don't know the size of all of them. However, 37 together they compose a sufficiently large number (over 5000) that the 38 law of large numbers applies to the population (our constituency). 39 40 What I am saying is that, if 28 out of 30 members choose the same 41 option, this must be significant and useful to our process. If I 42 remember my statistics correctly, in order to say that something is 43 significant and useful one must also be able to say that confounding 44 factors do not account for the unanimity. We are attempting to 45 control confounding factors at the task force meeting. Thus in 46 selecting the task force members we attempted to find individuals who 47 will represent the views of the groups from which they are selected. 48 Further, all options will be presented to the task force without bias. 49 The task force is split into four small groups so that no one 50 personality will dominate and influence the choices of all others. 51 The moderators of each sub-group will give each member equal time in 52 the discussions. Therefore, if 28 out of 30 task force members arrive 53 at the same conclusion, I believe it would be improbable that such 54 unanimity would be attributable to chance, some confounding factor, or 55 anything other than the merits of the options. Thus, I believe such 56 an outcome would be significant no matter what applicable test for 57 significance one used. 58 59 But what about useful? I intuit that if 28 out of 30 task force 60 members reach the same conclusion it is not just significant, but it 61 also has predictive value. Perhaps the question becomes, "just how 62 representative of the population is the sample?" Since the sample was 63 not randomly selected I can't quantify what percentage of the 64 population will reach the same conclusions (plus or minus some range, 65 with a given degree of error). Yet, for each member of the task force 66 I believe there is better than a 50/50 chance that they are 67 representative of the group they purport to represent. If not, we 68 have done a lousy job of selecting the task force members. We have, 69 however, invested significant time and effort in selecting the task 70 force members to the end that they would be representative of their 71 various groups. So, with that background information, here is my 72 question: 73 74 Given a high degree of unanimity among the task force members (28 out 75 of 30 in agreement) and further given that they are more likely than 76 not to represent (be representative of) their various groups, and 77 assuming (or not) that the various groups from which they were chosen 78 are equal in size, could I say with better than, say, 65% confidence 79 that most members of the population (our constituency) would reach the 80 same conclusion as the 28 reached? 81 82 That would be the ideal outcome for the task force; that they help us 83 select an option that most of our constituents would agree with. 84 85 If you or your colleagues don't have the time to answer this 86 question(s) would you please recommend some book(s) on the topic that 87 I could slog through without sinking over my head. Your help is 88 greatly appreciated. 89