[Return
to tutorial page]
##
Statistics 200: Lab 5 (Friday 13 February 1998)

Today's tasks:

*More about functions. Default arguments, variable numbers of arguments,
return values. Idiot-proofing. Looping and conditional computations.*
####
Problem 1 (How Splus deals with missing values)

Look at the function ``mean`'. Figure out what all the extra arguments
are used for. (In particular, figure out how the optional arguments work.)
We will need to feed it some messed up data, to see how it deals with it.
Try mean(month.name).
Create a data vector with a missing value,

junk <- c(1:10,NA).

Explain the output from the following,
mean(junk)
mean(junk,0,T)
mean(junk,T)
mean(junk, na.rm=T)

Try to find some data form of input that `mean()` cannot handle.
**Homework (Protecting your function from wrong input): (to be demonstrated
to JAH or BM) Create a function mymean() that accepts the same arguments
and has the same defaults as mean, but which gives some form of appropriate
warnings if the user does silly things. Then add an another optional argument
***explain* (which is false by default) to the function, so that your
warnings appear only if the user calls mymean() with explain=T. Hint:
Look at the help for the functions `stop()`, `warning()`
and `missing()`.

####
Problem 2 (Avoiding loops)

Here is a simple little function that generates a given number (`=replicates`)
of samples of given size (`= sample.size`) from a standard normal
distribution, calculates the mean for each sample, then plots a histogram
of the the set of means:
resample<-
function(sample.size, replicates)
{
out <- vector()
for(i in seq(1, replicates)) {
samp <- rnorm(sample.size)
out[i] <- mean(samp)
}
hist(out)
}

Check that you understand how the function works. What do each of the commands
mean? You will notice that the function uses a `for()` loop,
these are not very efficient in Splus, but there is usually a way of avoiding
them.
Rewrite the function to remove the `for()` loop, by generating
a matrix of random normals (using `rnorm`) with `dim` equal
to `c(replicates,sample.size)`, then `apply()` the `mean`
function. What advantages or disadvantages do you see with each form of
the function? Hint: Try some big numbers.

####
Problem 3 (Default values, and arbitrary collections of arguments)

Modify the function so that the default `sample.size` is 100 and
the default number of `replicates` is 10. Allow for an optional
argument called `trim`, which defaults to 0, to be passed the the
`mean` function. Also allow for arbitrary collections of named arguments
to be passed to the `hist` function. For example,
resample(50,trim = 0.25, xlab="samples of size 50")

should use `sample.size` = 50, replicates = the default value, and
it should calculate a 25% trimmed mean and write the `xlab` under
the histogram. Make your function return something useful, such as a set
of summary statistics for the generated (trimmed) means.
Hint: You need to find help on using variable numbers of arguments:

see "`...`" as described on page 95 of Venables & Ripley.

####
Problem 4 (Protecting your function from wrong input)

Make your function stop, printing out an appropriate message, if `sample.size`
or `replicates` are given incorrectly. For example, your function
should protest at being given the following arguments:
resample(-3,10)
resample(month.name)

What other sorts of input should you protect against? Hint: `?stop`
####
Problem 5 (Generalizing your function even further)

Modify your function so that it is not restricted to generating random
samples from the standard normal. For example, you could allow the mean
and variance of the normal to be specified, or you could allow different
distributions (uniform, gamma, ...).