# Sampling

Since it is generally impossible to study an entire population (every individual in a country,
all college students, every geographic area, etc.), researchers typically rely on *sampling
* to acquire a section of the population to perform an experiment or observational study.
It is important that the group selected be representative of the population, and not biased
in a systematic manner. For example, a group comprised of the wealthiest individuals in a given
area probably would not accurately reflect the opinions of the entire population in that area.
For this reason, randomization is typically employed to achieve an
unbiased sample. The most common sampling designs are *simple random sampling*,
*stratified random sampling*, and *multistage random sampling*.
### Simple Random Sampling

*Simple random sampling* is the basic sampling technique where we select a group of
subjects (a sample) for study from a larger group (a population). Each
individual is chosen entirely by chance and each member of the population has an equal chance
of being included in the sample. Every possible sample of a given
size has the same chance of selection.

(*Definition taken from Valerie J. Easton and John H. McColl's
Statistics Glossary v1.1*)

### Stratified Random Sampling

There may often be factors which divide up the population into sub-populations (groups / strata)
and we may expect the measurement of interest to vary among the different sub-populations.
This has to be accounted for when we select a sample from the population in order that we
obtain a sample that is representative of the population. This is achieved by
*stratified sampling*.
A stratified sample is obtained by taking samples from each stratum or sub-group of a population.

When we sample a population with several strata, we generally require that the
proportion of each stratum in the sample should be the same as in the population.

Stratified sampling techniques are generally used when the population is heterogeneous, or
dissimilar, where certain homogeneous, or similar, sub-populations can
be isolated (strata). Simple random sampling is most appropriate when the entire population
from which the sample is taken is homogeneous. Some reasons for
using stratified sampling over simple random sampling are:

a) the cost per observation in the survey may be reduced;

b) estimates of the population parameters may be wanted for each sub-population;

c) increased accuracy at given cost.

__Example__

Suppose a farmer wishes to work out the average milk yield of each cow type in his
herd which consists of Ayrshire, Friesian, Galloway and Jersey cows. He could
divide up his herd into the four sub-groups and take samples from these.

(*Definition and example taken from Valerie J. Easton and John H. McColl's
Statistics Glossary v1.1*)

### Multistage Random Sampling

A *multistage random sample* is constructed by taking a series of simple random samples
in stages. This type of sampling is often more practical than simple random sampling for studies
requiring "on location" analysis, such as door-to-door surveys. In a multistage random sample,
a large area, such as a country, is first divided into smaller regions (such as states), and
a random sample of these regions is collected. In the second stage, a random sample of smaller
areas (such as counties) is taken from within each of the regions chosen in the first stage. Then,
in the third stage, a random sample of even smaller areas (such as neighborhoods) is taken from
within each of the areas chosen in the second stage. If these areas are sufficiently small for the
purposes of the study, then the researcher might stop at the third stage. If not, he or she may
continue to sample from the areas chosen in the third stage, etc., until appropriately small areas
have been chosen.