Mean and Variance of Random Variables

Mean

The mean of a discrete random variable X is a weighted average of the possible values that the random variable can take. Unlike the sample mean of a group of observations, which gives each observation equal weight, the mean of a random variable weights each outcome x_i according to its probability, p_i. The common symbol for the mean (also known as the expected value of X) is

, formally defined by

The mean of a random variable provides the long-run average of the variable, or the expected average outcome over many observations.

Example

Suppose an individual plays a gambling game where it is possible to lose $1.00, break even, win $3.00, or win $10.00 each time she plays. The probability distribution for each outcome is provided by the following table:

Outcome		-$1.00	$0.00	$3.00	$5.00	
Probability	  0.30	 0.40	 0.20	 0.10

The mean outcome for this game is calculated as follows:

= (-1*.3) + (0*.4) + (3*.2) + (10*0.1) = -0.3 + 0.6 + 0.5 = 0.8.
In the long run, then, the player can expect to win about 80 cents playing this game -- the odds are in her favor.

For a continuous random variable, the mean is defined by the density curve of the distribution. For a symmetric density curve, such as the normal density, the mean lies at the center of the curve.

The law of large numbers states that the observed random mean from an increasingly large number of observations of a random variable will always approach the distribution mean . That is, as the number of observations increases, the mean of these observations will become closer and closer to the true mean of the random variable. This does not imply, however, that short term averages will reflect the mean.

In the above gambling example, suppose a woman plays the game five times, with the outcomes $0.00, -$1.00, $0.00, $0.00, -$1.00. She might assume, since the true mean of the random variable is $0.80, that she will win the next few games in order to "make up" for the fact that she has been losing. Unfortunately for her, this logic has no basis in probability theory. The law of large numbers does not apply for a short string of events, and her chances of winning the next game are no better than if she had won the previous game.

Properties of Means

If a random variable X is adjusted by multiplying by the value b and adding the value a, then the mean is affected as follows:

Example

In the above gambling game, suppose the casino realizes that it is losing money in the long term and decides to adjust the payout levels by subtracting $1.00 from each prize. The new probability distribution for each outcome is provided by the following table:

Outcome		-$2.00	-$1.00	 $2.00	 $4.00	
Probability	  0.30	  0.40	  0.20	  0.10

The new mean is (-2*0.3) + (-1*0.4) + (2*0.2) + (4*0.1) = -0.6 + -0.4 + 0.4 + 0.4 = -0.2. This is equivalent to subtracting $1.00 from the original value of the mean, 0.8 -1.00 = -0.2. With the new payouts, the casino can expect to win 20 cents in the long run.

Suppose that the casino decides that the game does not have an impressive enough top prize with the lower payouts, and decides to double all of the prizes, as follows:

Outcome		-$4.00	-$2.00	 $4.00	 $8.00	
Probability	  0.30	  0.40	  0.20	  0.10

Now the mean is (-4*0.3) + (-2*0.4) + (4*0.2) + (8*0.1) = -1.2 + -0.8 + 0.8 + 0.8 = -0.4. This is equivalent to multiplying the previous value of the mean by 2, increasing the expected winnings of the casino to 40 cents.

Overall, the difference between the original value of the mean (0.8) and the new value of the mean (-0.4) may be summarized by (0.8 - 1.0)*2 = -0.4.

The mean of the sum of two random variables X and Y is the sum of their means:

For example, suppose a casino offers one gambling game whose mean winnings are -$0.20 per play, and another game whose mean winnings are -$0.10 per play. Then the mean winnings for an individual simultaneously playing both games per play are -$0.20 + -$0.10 = -$0.30.

Variance

The variance of a discrete random variable X measures the spread, or variability, of the distribution, and is defined by

The standard deviation is the square root of the variance.

Example

In the original gambling game above, the probability distribution was defined to be:

Outcome		-$1.00	$0.00	$3.00	$5.00	
Probability	  0.30	 0.40	 0.20	 0.10

The variance for this distribution, with mean = 0.8, may be calculated as follows:
(-1 - 0.8)²*0.3 + (0 - 0.8)²*0.4 + (3 - 0.8)²*0.2 + (5 - 0.3)²*0.1
= (-1.8)²*0.3 + (-0.8)²*0.4 + (2.2)²*0.2 + (4.2)²*0.1
= 3.24*0.3 + 0.64*0.4 + 4.84*0.2 + 17.64*0.1
= 0.972 + 0.256 + 0.968 + 1.764 = 3.960, with standard deviation = 1.990.
Since there is not a very large range of possible values, the variance is small.

Properties of Variances

If a random variable X is adjusted by multiplying by the value b and adding the value a, then the variance is affected as follows:

Since the spread of the distribution is not affected by adding or subtracting a constant, the value a is not considered. And, since the variance is a sum of squared terms, any multiplier value b must also be squared when adjusting the variance.

Example

As in the case of the mean, consider the gambling game in which the casino chooses to lower each payout by $1.00, then double each prize. The resulting distribution is the following:

Outcome		-$4.00	-$2.00	 $4.00	 $8.00	
Probability	  0.30	  0.40	  0.20	  0.10

The variance for this distribution, with mean = -0.4, may be calculated as follows:
(-4 -(-0.4))²*0.3 + (-2 - (-0.4))²*0.4 + (4 - (-0.4))²*0.2 + (8 - (-0.4))²*0.1
= (-3.6)²*0.3 + (-1.6)²*0.4 + (4.4)²*0.2 + (8.4)²*0.1
= 12.96*0.3 + 2.56*0.4 + 19.36*0.2 + 70.56*0.1
= 3.888 + 1.024 + 3.872 + 7.056 = 15.84, with standard deviation = 3.980.
This is equivalent to multiplying the original value of the variance by 4, the square of the multiplying constant.

For independent random variables X and Y, the variance of their sum or difference is the sum of their variances:

Variances are added for both the sum and difference of two independent random variables because the variation in each variable contributes to the variation in each case. If the variables are not independent, then variability in one variable is related to variability in the other. For this reason, the variance of their sum or difference may not be calculated using the above formula.

For example, suppose the amount of money (in dollars) a group of individuals spends on lunch is represented by variable X, and the amount of money the same group of individuals spends on dinner is represented by variable Y. The variance of the sum X + Y may not be calculated as the sum of the variances, since X and Y may not be considered as independent variables.