[Return to tutorial page]

Statistics 200: Lab 5

Today's tasks:
More about functions. Default arguments, variable numbers of arguments, return values. Idiot-proofing. Looping and conditional computations.

Problem 1

Look at the function `mean'. Figure out what all the extra arguments are for. (In particular, figure out how the optional arguments work.) Feed it some messed up data. Try mean(month.name). Create a vector with a missing value: junk <- c(1:10,NA).

Explain the output from:

mean(junk, na.rm=T)
Try to find some form of input that mean() cannot handle.

Todays homework (to be demonstrated to DP or BM): Create a function mymean() that accepts the same arguments and has the same defaults as mean, but which gives some form of appropriate warnings if the user does silly things. Then add an another optional argument explain (which is false by default) to the function, so that your warnings appear only if the user calls mymean() with explain=T. Hint: Look at the help for the functions stop(), warning() and missing().

Problem 2

Here is a simple little function that generates a given number (=replicates) of samples of given size (= sample.size) from a standard normal distribution, calculates the mean for each sample, then plots a histogram of the the set of means:
function(sample.size, replicates)
	out <- vector()
	for(i in seq(1, replicates)) {
		samp <- rnorm(sample.size)
		out[i] <- mean(samp)
Rewrite the function to remove the for( ) loop, by generating a matrix of random normals (using rnorm) with dim equal to c(replicates,sample.size), then apply() the mean function. What advantages or disadvantages do you see with each form of the function? Hint: Try some big numbers.

Problem 3

Modify the function so that the default sample.size is 100 and the default number of replicates is 10. Allow for an optional argument called trim, which defaults to 0, to be passed the the mean function. Also allow for arbitrary collections of named arguments to be passed to the hist function. For example,
resample(50,trim = 0.25, xlab="samples of size 50")
should use sample.size = 50, replicates = the default value, and it should calculate a 25% trimmed mean and write the xlab under the histogram. Make your function return something useful, such as a set of summary statistics for the generated (trimmed) means. Hint: You need to find help on variable numbers of arguments: see "special argument ..." as described on page 95 of Venables & Ripley.

Problem 4

Make your function stop, printing out an appropriate message, if sample.size or replicates are given incorrectly. For example, your function should protest at
What other sorts of input should you protect against? Hint: ?stop

Problem 5

(For enthusiasts only). Modify your function so that it is not restricted to generating random samples from the standard normal. For example, you could allow the mean and variance of the normal to be specified, or you could allow different distributions (uniform, gamma, ...).