Stat 200 lab1

Statistics 200: Lab 1 ( Friday 5 September 97)

Today's tasks: Getting into and out of S-Plus. Help! Saving your work. Incorporating S-Plus output in reports. Introduction to lists, vectors, matrices, functions, and graphics. An illustrative example.

If you get lost on the Web: Look at: http://statlab.stat.yale.edu (look under ABOUT THE STATLAB for Course Materials then follow the links through to the syllabus for Stat 200.

Start Splus by clicking on the icon with the litle blue (green?) squares. (Note: You do want to clean up the data directory, at least on your first session.) Try a few calculations. (The > sign at the start of a line is the prompt; you don't type it.) I have included some of the responses from Splus.) Follow along with the calculations, typing them in at your own machine. Try to understand the response from Splus. We will come around the class and ask you your interpretations.

> pi
[1] 3.141593
> x<-sqrt(3)
> x
[1] 1.732051
> y <- c(2,3,4)
> mm <- matrix(1:10,2,5)

Interpret: x^y, y^x, y^mm, x^mm x+y, y+mm What is the difference between: mean(x), mean(mm), mean(), mean?

> z <- seq(1.5,1.9,0.1)
> z
[1] 1.5 1.6 1.7 1.8 1.9
> z[2]
[1] 1.6
> z[1:3]
[1] 1.5 1.6 1.7
> z>pi/2
[1] F T T T T
> z[z>pi/2]
[1] 1.6 1.7 1.8 1.9

What happened?

Getting help

Try the help menu. Use the search button. Find out what asin() does. Also, try typing

>help(asin)
>?asin

How many ways are there to get help on an Splus function?

Get help on the q() function to find out how to quit.

Explain why

> asin(x/2)/pi

gives the value it does.

Matrices

Try:

> mm>5
     [,1] [,2] [,3] [,4] [,5] 
[1,]    F    F    F    T    T
[2,]    F    F    T    T    T
> mm[mm>5]
[1]  6  7  8  9 10
> mm + (mm > 5)   # true = 1, false = 0
> dim(mm)
> dim(y)
>length(mm)
>length(y)

What does this say about how the matrix is stored?

Try:

> AA <- matrix(1:9,3,3)
> BB <- matrix(1:9,3,3,byrow=T)
> t(AA)      # t() is the transpose function

Now try some multiplications:

> AA * AA
> AA %*% AA

What do these two products represent? Try:

> mm%*%seq(5,length=5)

What happened?

What does diag do? Try:

> diag(1:5) 
> diag(mm)

Lists

> ll <- list() 
> ll$first <- "Hello there"
> ll$m<-mm
> ll$last <- "Goodbye"
> ll

$first:
[1] "Hello there"
$m:
     [,1] [,2] [,3] [,4] [,5] 
[1,]    1    3    5    7    9
[2,]    2    4    6    8   10
$last:
[1] "Goodbye"

Functions

A very simple function:

> test1 <- function(M){
ave<-mean(M)
sum((M-ave)^2)
}
> test1(ll$m)
[1] 82.5

What happened? Try feeding the function some other objects. Try using the fix() function to construct your own version of test1. [It will probably take you a little while to get used to some of the quirks of fix().]

An example

The data set car.all(stored as an Splus data frame) lives in one of the Splus data directories. Try the following:

> names(car.all)
> car <- car.all[sample(1:111,40),c("We","HP","Pri","Mil","Cou")]
> car
> attributes(car) #what are attributes?
> attributes(car$Country)

What did each of those commands do?

You now have a data frame called car whose 40 rows were sampled at random from the 111 rows of car.all, and whose 5 columns were identified uniquely by the first few letters of column names from car.all. (Splus tries very hard to make sense of what you type. If it can identify a unique component of a list or data frame from the first few letters it will not demand the whole name. Beware: there are some places where abbreviations can lead to results that you might not expect. Splus tries hard, but it can't read your mind.)

Try:

 
> table(car$Co)
> win.graph()   #start up a graphics device
> plot(car$W,car$P) # plot prices against weights
> plot(car$W,car$M)
> plot(1/car$W,car$M)  # what is plotted?
> pairs(car)  # plot all possible pairs

Try a linear model with mileage predicted by a linear function of weight:

 
> lm(car$Mil ~car$We)  # problems with missing values?
> lm(car$Mil ~car$We,na.action=na.omit)  # what does NA stand for?

Same idea, but save the output for future reference:

 
> reg1 <- lm(car$Mil ~car$We,na.action=na.omit)
> reg1  # look at it
> names(reg1)  # names of components
> plot(reg1$fit,reg1$res) # a plot of residuals versus fitted values

Try modelling mileage as a linear function of weight and price:

> reg2 <- lm(car$Mil ~car$We+car$Pr,na.action=na.omit)

As you can see, there is a special shorthand for describing statistical models. The same shorthand is used for many statistical functions, but (unfortunately) not for all statistical functions. The bits of Splus that have survived from its original incarnation usually offer fewer bells and whistles than their fancier, more recent offspring. Splus is still growing.

Graphics

Try:

> xvalues <- seq(-2,2,by=0.05)
> plot(xvalues,exp(-xvalues^2))

Problem: Use the help for "plot" to figure out how to get a smooth line for the plot (connect the dots). Fancier: Figure out how to put a title on the plot and change the axis labels.

Saving your S objects and getting them back

There are several ways to save data. Some ways leave the data in a form only understood by Splus. You will learn more about this in Lab 2. For the moment, it's enough to learn how to preserve data from one Splus session to the next.

I have been working for a while, creating S objects. This is what I have so far:

> objects()
 [1] ".Last.fixed" ".Last.value" "last.dump"  
 [4] "ll"          "mm"          "test1"      
 [7] "test2"       "test3"       "values"     
[10] "x"           "xvalues"     "y"          
[13] "z"

Now I need coffee. I put my floppy disk into the slot (it becomes drive A:) , then dump (everything):

> dump(objects(),"A:\\sept4")
[1] "A:\\sept4"

For the purposes of a test, I kill everything, then try to restore it from my floppy disk.

> remove(objects()) # all gone:
> objects()
character(0)
> source("A:\\sept4")  # Back again:
> objects()
 [1] ".Last.fixed" ".Last.value" "last.dump"  
 [4] "ll"          "mm"          "test1"      
 [7] "test2"       "test3"       "values"     
[10] "x"           "xvalues"     "y"          
[13] "z"

Your task for today

Give me a printed sheet, with your name, major (or department and year for grad students), and email address.
Find the sum of the series 1/n^2 from n=1 to n= infinity (or close enough to infinity). Put the answer on your sheet. (Don't copy by hand: use cut and paste.)
Draw the function f(x) = x sin(x) in the range -1 to =1.
(For the enthusiasts.) Generate a sample of 200 random normal variates (?rnorm). Draw a histogram (?hist) of the sample values. Include a picture of the histogram on your sheet.

[Return to syllabus page]