## Statistics 200: Lab 1 ( Friday 16 January 98)

Today's tasks: Getting into and out of S-Plus. Help! Saving your work. Incorporating S-Plus output in reports. Introduction to lists, vectors, matrices, functions, and graphics. An illustrative example.

If you get lost on the Web: Look at: http://statlab.stat.yale.edu (look under ABOUT THE STATLAB for Course Materials then follow the links through to the syllabus for Stat 200.

To start Splus, you click on the Splus icon (little blue/green squares) in the Applications window. You will be asked if you want to clean up the data directory. Since this is your first time using Splus you probably should click "Yes". You will see in the Splus window that there is a command window with a > sign on the left, this is where the commands will be entered. We are now ready to try a few calculations.

(The > sign at the start of a line is the prompt; you don't type it. The responses that Splus gives have been included also).

#### Getting Started With Splus

Follow along with these calculations, typing them in at your own machine. Try to understand the response from Splus. We will be asking you how you interpret what Splus has done.

> pi
[1] 3.141593

Yes, Splus knows the value of that famous constant

> x<-sqrt(3)
> x
[1] 1.732051

> v <- c(2,3,4)
> v

> M <- matrix(1:10,2,5)
> M

Note that Splus is case sensitive!

#### Arithmetic Operations And Functions

You have entered three different variables, we shall try to perform some arithmetic with these. You must now Interpret: x^4, x+3, y*x, x^v v^x, v+x, M*x, M^x, M+v. Some of these operations are not ones that you usually think of, what did Splus do in these cases?

Some other arithmetic operators in Splus are: +, -, *, /, ^, %*%, etc.

Splus has many built in functions and you will become familiar with these as the course progresses.
To illustrate the use of the mean function try to find the difference between: mean(x), mean(M), mean(), mean?

#### Getting help

Try the help menu. Use the search button. Find out what asin() does. Also, try typing

>help(asin)
>?asin

How many ways are there to get help on an Splus function?

Explain why

> asin(x/2)/pi

gives the value it does.

Get help on the q() function to find out how to quit from Splus.

#### More Vectors And Indexing Vectors

We now see an easy way of entering some types of sequences of numbers into a vector.

> z <- seq(1.5,1.9,0.1)
> z
[1] 1.5 1.6 1.7 1.8 1.9

Vectors are indexed in many different ways, we illustrate four different ways, can you interpret them?

> z[2]
[1] 1.6
> z[1:3]
[1] 1.5 1.6 1.7

> z>pi/2
[1] F T T T T
> z[z>pi/2]
[1] 1.6 1.7 1.8 1.9

> z[-c(1,3)]
[1] 1.6 1.8 1.9

> names(z)<-c("First","Second","Third","Fourth","Fifth")
> z
First Second Third Fourth Fifth
1.5    1.6   1.7    1.8   1.9
> z[c("Second","Fifth")]
Second Fifth
1.6   1.9

What happened?

#### Matrices (Indexing, Creating And Arithmetic Operations)

Matrices are very well handled in Splus, here are some examples to illustrate how Splus deals with them.

> M
> M[2,3]
> M[,3]
> M[2,]
> M[,-c(1,2)]
> M>5
[,1] [,2] [,3] [,4] [,5]
[1,]    F    F    F    T    T
[2,]    F    F    T    T    T

> M[M>5]
[1]  6  7  8  9 10

How are matrices indexed, remember the ways of indexing vectors?

> M + (M > 5)

How does Splus treat logical variables when included in arithmetic expressions?

> dim(M)
> dim(v)

>length(M)
>length(v)

How does the length function differ from the dim function?

Try the following two ways of creating a matrix:

> AA <- matrix(1:9,3,3)
> BB <- matrix(1:9,3,3,byrow=T)
> t(AA)      # t() is the transpose function

Now try some multiplications:

> AA * AA
> AA %*% AA

What do these two products represent? Try:

> mm%*%seq(5,length=5)

What happened?

What does the diag function do? Try:

> diag(1:5)
> diag(mm)

#### Lists

Lists are an important part of Splus, they are a data format similar to vectors but they are more general. A list is a group of Splus objects (an Splus vector is a group of numbers, or characters) and lists can be indexed by numbers or names (just like Splus vectors).

> ll <- list()
> ll\$first <- "Hello there"
> ll\$mat<-M
> ll\$const<-pi
> ll\$last <- "Goodbye"
> ll

\$first:
[1] "Hello there"
\$mat:
[,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7    9
[2,]    2    4    6    8   10
\$const:
[1] 3.141593
\$last:
[1] "Goodbye"

#### Functions

It is easy to write functions in Splus, and in fact, it is a very useful skill to acquire.

Here's a very simple function for you to try first:

> test1 <- function(m){
ave<-mean(m)
sum((m-ave)^2)
}
> test1(ll\$mat)
[1] 82.5

What happened? Try feeding the function some other objects (x, v or M for example) You can easily change your functions by using the up-arrow key to recall the last lines that were entered. Could you edit test1 so that it outputs the mean of the object also? Think about lists!

#### An Example Of Doing Some Statistics With Splus

First we'll get you some data. The data set car.all (stored as an Splus data frame) lives in one of the Splus data directories. Try the following:

> names(car.all)
> car <- car.all[sample(1:111,40),c("We","HP","Pri","Mil","Cou")]

You might want to check the help files for sample.

> car
> attributes(car)
> attributes(car\$Country)

What did each of those commands do? What are attributes?

You now have a data frame called car whose 40 rows were sampled at random from the 111 rows of car.all, and whose 5 columns were identified uniquely by the first few letters of column names from car.all. (Splus tries very hard to make sense of what you type. If it can identify a unique component of a list or data frame from the first few letters it will not demand the whole name. Beware: there are some places where abbreviations can lead to results that you might not expect. Splus tries hard, but it can't read your mind.)

Try the following commands and interpret them carefully:

> table(car\$Co)
> win.graph()

This is how you plot variables:

> plot(car\$W,car\$P)
> plot(car\$W,car\$M)
> plot(1/car\$W,car\$M)
> pairs(car)

What was plotted in each case?

We will try and fit a linear model with mileage predicted by a linear function of weight. Here's how to do it:

> lm(car\$Mil ~car\$We)

An error message by any chance?

> lm(car\$Mil ~car\$We,na.action=na.omit)

Any better? Let's fit the model again but store the output in an Splus object.

> reg1 <- lm(car\$Mil ~car\$We,na.action=na.omit)
> reg1

Do you undertand the output?

> names(reg1)

So they are the names of the components of reg1

> plot(reg1\$fit,reg1\$res)

Yes, a plot of residuals versus fitted values.

Try modelling mileage as a linear function of weight and price:

> reg2 <- lm(car\$Mil ~car\$We+car\$Pr,na.action=na.omit)

As you can see, there is a special shorthand for describing statistical models. The same shorthand is used for many statistical functions, but (unfortunately) not for all statistical functions. The bits of Splus that have survived from its original incarnation usually offer fewer bells and whistles than their fancier, more recent offspring. Splus is still growing.

#### Graphics

Try:

> xvalues <- seq(-2,2,by=0.05)
> plot(xvalues,exp(-xvalues^2))

Use the help for "plot" to figure out how to get a smooth line for the plot (connect the dots).  Improve the plot by figuring out how to put a title on the plot and change the axis labels.

#### Saving your S objects and getting them back

There are several ways to save data. Some ways leave the data in a form only understood by Splus. You will learn more about this in Lab 2. For the moment, it's enough to learn how to preserve data from one Splus session to the next.

You have been working for a while, creating S objects. This is what you have so far:

> objects()
[1] ".Last.fixed" ".Last.value" "last.dump"
[4] "ll"          "M"          "test1"
[7] "test2"       "test3"       "values"
[10] "x"           "xvalues"     "v"
[13] "z"

Now you need a break. You put your floppy disk into the slot (it becomes drive A:) , then dump (everything):

> dump(objects(),"A:\\jan16")
[1] "A:\\jan16"

For the purposes of a test, you kill everything, then try to restore it from your floppy disk.

> remove(objects())
> objects()
character(0)

Yes, they're gone but you can get them back!

> source("A:\\jan16")
> objects()
[1] ".Last.fixed" ".Last.value" "last.dump"
[4] "ll"          "M"          "test1"
[7] "test2"       "test3"       "values"
[10] "x"           "xvalues"     "v"
[13] "z"