Stat200 lab5

Statistics 200: Lab 5 (Friday 13 February 1998)

Today's tasks:
More about functions. Default arguments, variable numbers of arguments, return values. Idiot-proofing. Looping and conditional computations.

Problem 1 (How Splus deals with missing values)

Look at the function `mean'. Figure out what all the extra arguments are used for. (In particular, figure out how the optional arguments work.) We will need to feed it some messed up data, to see how it deals with it. Try mean(month.name).

Create a data vector with a missing value,

junk <- c(1:10,NA).

Explain the output from the following,

mean(junk)
mean(junk,0,T)
mean(junk,T)
mean(junk, na.rm=T)

Try to find some data form of input that mean() cannot handle.

Homework (Protecting your function from wrong input): (to be demonstrated to JAH or BM) Create a function mymean() that accepts the same arguments and has the same defaults as mean, but which gives some form of appropriate warnings if the user does silly things. Then add an another optional argument explain (which is false by default) to the function, so that your warnings appear only if the user calls mymean() with explain=T. Hint: Look at the help for the functions stop(), warning() and missing().

Problem 2 (Avoiding loops)

Here is a simple little function that generates a given number (=replicates) of samples of given size (= sample.size) from a standard normal distribution, calculates the mean for each sample, then plots a histogram of the the set of means:

resample<-
function(sample.size, replicates)
{
        out <- vector()
        for(i in seq(1, replicates)) {
                samp <- rnorm(sample.size)
                out[i] <- mean(samp)
        }
        hist(out)
}

Check that you understand how the function works. What do each of the commands mean? You will notice that the function uses a for() loop, these are not very efficient in Splus, but there is usually a way of avoiding them.

Rewrite the function to remove the for() loop, by generating a matrix of random normals (using rnorm) with dim equal to c(replicates,sample.size), then apply() the mean function. What advantages or disadvantages do you see with each form of the function? Hint: Try some big numbers.

Problem 3 (Default values, and arbitrary collections of arguments)

Modify the function so that the default sample.size is 100 and the default number of replicates is 10. Allow for an optional argument called trim, which defaults to 0, to be passed the the mean function. Also allow for arbitrary collections of named arguments to be passed to the hist function. For example,

resample(50,trim = 0.25, xlab="samples of size 50")

should use sample.size = 50, replicates = the default value, and it should calculate a 25% trimmed mean and write the xlab under the histogram. Make your function return something useful, such as a set of summary statistics for the generated (trimmed) means.

Hint: You need to find help on using variable numbers of arguments:
see "..." as described on page 95 of Venables & Ripley.

Problem 4 (Protecting your function from wrong input)

Make your function stop, printing out an appropriate message, if sample.size or replicates are given incorrectly. For example, your function should protest at being given the following arguments:

resample(-3,10)
resample(month.name)

What other sorts of input should you protect against? Hint: ?stop

Problem 5 (Generalizing your function even further)

Modify your function so that it is not restricted to generating random samples from the standard normal. For example, you could allow the mean and variance of the normal to be specified, or you could allow different distributions (uniform, gamma, ...).