Today we learn a few methods that can be very useful when we have multivariate data and we want to produce a plot of the data (unfortunately we are restricted to two-dimensional paper). Splus has many graphical routines that can be used to to produce graphical presentations of multivariate data.
Generate one data set containing 200 observations from a N(mu1,Sigma1) distribution and another data set containing 200 observations from a N(mu2,Sigma2) distribution. Let mu1 and mu2 be vectors of length 5 (make them different) and Sigma1 and Sigma2 be 5 by 5 matrices (they have to be symmetric and positive definite, you could just use the identity matrix (?diag) if you want to). Combine the two data sets into a single data set which should contain 400 observations, with each observation consisting of 5 values. Call this sample samp, we shall use this sample to demonstrate two useful commands for visualizing multivariate data.
What do you think? Try and use some of those boxes in the corner. "Spin" the data. You can change the variables that are used in the plot (you can only use three), can you see how to do this?
A command that is similar to spin but more general and even more interactive is brush.
Try the command,
> brush(samp)
What do you think? Can you work out how to use the "brush"? The default brush size is quite large, can you change it?
If you chose your mu1,mu2,Sigma1, and Sigma2 to be far apart then you should try and mark one of the groups of data using the brush. What does Splus give you when you've exited from the brush command.
Some other plotting routines that you have already seen can be used to plot multivariate data, try pairs for example.
If you know a bit about multivariate data analysis then you might want to try to do a pairs plot of the principal component scores for the data. Splus has two sets of principle components routines, called prcomp and princomp. In fact, if you're lucky a plot of the first two principal components could provide a good description of the data. If you don't know what principal components are then you might want to ask BM or JH for a brief introduction to them.
Think of a function f(x,y) that takes real values (e.g.. exp(-(x^2+y^2)))
Construct an increasing sequence of x values (say 20 values) and an
increasing sequence of y values (say 30 values).
Try the command expand.grid(x,y), what does it do?
Evaluate your function on each point that the expand.grid
outputted. Store the values in a 20 by 30 matrix called z.
Try the following commands,
> persp(x,y,z)
> contour(x,y,z)
> image(x,y,z)
The function contour has an argument add, try doing
an image plot and then superimposing a contour plot.
What do the many other arguments of the functions do? In particular
what is the eye option in the persp plot used for?
Using the mfrow option in par, put four representations
of your function on one screen. (I leave the choice of a fourth up to yourself!)