The * binomial distribution* describes the behavior of a count variable

**1:**The number of observations n is fixed.**2:**Each observation is independent.**3:**Each observation represents one of two outcomes ("success" or "failure").**4:**The probability of "success" p is the same for each outcome.

If these conditions are met, then

__Example__

Suppose individuals with a certain gene have a 0.70 probability of eventually contracting
a certain disease. If 100 individuals with the gene participate in a lifetime study, then the
distribution of the random variable describing the number of individuals who will contract the
disease is distributed *B(100,0.7)*.

*Note: The sampling distribution of a count variable is only well-described by the binomial
distribution is cases where the population size is significantly larger than the sample size.
As a general rule, the binomial distribution should not be applied to observations from
a simple random sample (SRS) unless the
population size is at least 10 times larger than the sample size.*

To find probabilities from a binomial distribution, one may either calculate them directly,
use a binomial table, or use a computer. The number of sixes rolled by a single die in 20
rolls has a *B(20,1/6)* distribution. The probability of rolling more than 2 sixes
in 20 rolls, *P(X>2)*, is equal to 1 - *P(X <2) = 1 - (P(X=0) + P(X=1) +
P(X=2))*. Using the MINITAB command "cdf" with subcommand "binomial n=20 p=0.166667" gives the cumulative
distribution function as follows:

Binomial with n = 20 and p = 0.166667 x P( X <= x) 0 0.0261 1 0.1304 2 0.3287 3 0.5665 4 0.7687 5 0.8982 6 0.9629 7 0.9887 8 0.9972 9 0.9994The corresponding graphs for the probability density function and cumulative distribution function for the

Since the probability of 2 or fewer sixes is equal to 0.3287, the probability of rolling more than 2 sixes = 1 - 0.3287 = 0.6713.

**The probability that a random variable X with binomial distribution B(n,p) is
equal to the value k, where k = 0, 1,....,n , is given by **
, where .

The latter expression is known as the

These definitions are intuitively logical. Imagine, for example, 8 flips
of a coin. If the coin is fair, then *p* = 0.5. One would expect the
mean number of heads to be half the flips, or *np* = 8*0.5 = 4. The
variance is equal to *np(1-p)* = 8*0.5*0.5 = 2.

In the example of rolling a six-sided die 20 times, the probability *p* of rolling
a six on any roll is 1/6, and the count *X* of sixes has a *B(20, 1/6)* distribution.
The mean of this distribution is 20/6 = 3.33, and the variance is 20*1/6*5/6 = 100/36 = 2.78.
The mean of the *proportion* of sixes in the 20 rolls, *X/20*, is equal to
*p* = 1/6 = 0.167, and the variance of the proportion is equal to (1/6*5/6)/20 = 0.007.

*Note: Because the normal approximation is not accurate for small values of n, a good rule of
thumb is to use the normal approximation only if np>10 and
np(1-p)>10.*

For example, consider a population of voters in a given state. The true proportion of voters who favor candidate A is equal to 0.40. Given a sample of 200 voters, what is the probability that more than half of the voters support candidate A?

The count *X* of voters in the sample of 200 who support candidate A is distributed
*B(200,0.4)*. The mean of the distribution is equal to 200*0.4 = 80, and the variance is equal
to 200*0.4*0.6 = 48. The standard deviation is the square root of the variance, 6.93. The
probability that more than half of the voters in the sample support candidate A is equal to
the probability that *X* is greater than 100, which is equal to 1- *P(X <* 100).

To use the normal approximation to calculate this probability, we should first acknowledge that
the normal distribution is *continuous* and apply the * continuity correction*.
This means that the probability for a single discrete value, such as 100, is extended to the
probability of the

So, applying the continuity correction and standardizing the variable *X* gives the following:

1 - *P(X <* 100)

= 1 -

= 1 -

= 1 -

= 1 -