Stat200 lab10

Statistics 200: Lab 10 (Friday 3 April 1998)

Today's tasks: Multivariate plots.

Today we learn a few methods that can be very useful when we have multivariate data and we want to produce a plot of the data (unfortunately we are restricted to two-dimensional paper). Splus has many graphical routines that can be used to to produce graphical presentations of multivariate data.

Generating some multivariate data

We shall start by generating a data set. We first note that Splus does not have a built-in routine for generating random multivariate normal observations. The help file for the rnorm() command gives Splus code for a function rmultnorm() that generates multivariate normal observations.

Generate one data set containing 200 observations from a N(mu1,Sigma1) distribution and another data set containing 200 observations from a N(mu2,Sigma2) distribution. Let mu1 and mu2 be vectors of length 5 (make them different) and Sigma1 and Sigma2 be 5 by 5 matrices (they have to be symmetric and positive definite, you could just use the identity matrix (?diag) if you want to). Combine the two data sets into a single data set which should contain 400 observations, with each observation consisting of 5 values. Call this sample samp, we shall use this sample to demonstrate two useful commands for visualizing multivariate data.

Plotting multivariate data points

Try the commands,
> win.graph()
> spin(samp)

What do you think? Try and use some of those boxes in the corner. "Spin" the data. You can change the variables that are used in the plot (you can only use three), can you see how to do this?

A command that is similar to spin but more general and even more interactive is brush.

Try the command,
> brush(samp)

What do you think? Can you work out how to use the "brush"? The default brush size is quite large, can you change it?

If you chose your mu1,mu2,Sigma1, and Sigma2 to be far apart then you should try and mark one of the groups of data using the brush. What does Splus give you when you've exited from the brush command.

Some other plotting routines that you have already seen can be used to plot multivariate data, try pairs for example.

If you know a bit about multivariate data analysis then you might want to try to do a pairs plot of the principal component scores for the data. Splus has two sets of principle components routines, called prcomp and princomp. In fact, if you're lucky a plot of the first two principal components could provide a good description of the data. If you don't know what principal components are then you might want to ask BM or JH for a brief introduction to them.

Plotting bivariate functions

Suppose that you have a bivariate function and you would like to see the shape of the function. There are many ways of plotting the function and many are easily implemented in Splus. We need to approximate the function on a grid and then Splus will be able to plot the function.

Think of a function f(x,y) that takes real values (e.g.. exp(-(x^2+y^2)))
Construct an increasing sequence of x values (say 20 values) and an increasing sequence of y values (say 30 values).
Try the command expand.grid(x,y), what does it do?
Evaluate your function on each point that the expand.grid outputted. Store the values in a 20 by 30 matrix called z.

Try the following commands,
> persp(x,y,z)
> contour(x,y,z)
> image(x,y,z)

The function contour has an argument add, try doing an image plot and then superimposing a contour plot.
What do the many other arguments of the functions do? In particular what is the eye option in the persp plot used for?

Using the mfrow option in par, put four representations of your function on one screen. (I leave the choice of a fourth up to yourself!)

Plotting the bivariate functions from data

You may have a data set that contains (x,y) pairs of values, and some z value that depends on the (x,y) values. You may be interested in plotting the z values as a function of (x,y). This sort of data is not always ready for immediate input into Splus, however Splus contains a function interp that interpolates the function and gives output that is ready for use by the persp, image, and contour functions. You should generate some (x,y) data and a vector of z values to test this function, or even better look at the old faithful dataset that is built into Splus, or any other dataset that may come to mind.