The Q-Q normal plot (?qqnorm) is a plot that is used as a graphical check on the normality of a sample: for samples from a normal distribution, the plot should be roughly a straight line. If the plot is to be a useful diagnostic, one must know what it means to be 'roughly straight'.
Jargon and fact: If xx is a sample of some sort then the values in sort(xx) are called the order statistics of the sample. The middle 50% of the standard normal distribution occupies an interval of length close to 1.35 (the IQR, the interquartile range).
contaminate(n=100,prob.bad=0,mean=0,sd=1,badmean=0,badsd=1)to generate samples of size n from a `contaminated normal distribution', as follows. Generate n observations from a normal distribution with mean `mean' and standard deviation `sd'. Also generate a value k from the Binomial(n,bad.prob) distribution. For the first k observations multiply the sample value by `badsd' then add on `badmean'. (If you want to disguise the bad values, you could return the sample in random order, or in sorted order.)
doublesort(M)that will accept a matrix M of numbers and return the result of first sorting each column then sorting each row.
orderstat(n,repl)to return `repl' many observations on the order statistics from a sample of size n from a standard normal distribution. Return your result in a matrix with n rows.
drawband(n,repl=100,L=6,H=95)that will draw a pair of curves (let me call them ylow(x) and yhigh(x) for the moment) showing a 'typical range' within which a qqnorm plot for standard normals should lie. For each x, the range from ylow(x) to yhigh(x) should be constructed so that about 90% of the qqnorm(rnorm(n)) plots should lie in the range. (Note: It would be a much stronger requirement to have the qqnorm plot lie completely between the curves ylow() and yhigh() with probability 90%.) Hint: Use matlines() to draw curves, interpolating between values constructed at the x values generated by qqnorm(rnorm(n),plot=F). Use your function from Problem 5 to generate ylow(x) and yhigh(x).
QQnorm(x,low=???,high=???,rescale=F)with suitable default values for low and high, to draw the qqnorm plot for the vector of data x, with ``error bands" added. If rescale is T, standardize x to have zero median and IQR 1.35 before drawing the plot.
If you get desperate, you could cheat by finding out the methods I used.
What is the moral of today's lab session?