We frequently need to be able to read in data from external sources into Splus. It could be that we find data on the internet, or we have data on a disk that we need to analyze using Splus. Today we learn how to read data from such external sources. Also today, we learn about another type of Splus object called a data frame, and we learn more about were Splus stores the variables that you define.
Note that throughout the notes for this lab you will see ?commandname. When you see one of these we suggest that you check the help command for that command name in Splus. We want you to fully understand these commands and the help files are a great place to start.
Click on the link to the U.S. Census Bureau. This is how you navigate your way through to the data for New Haven County:
> age <- read.table("NHage.txt")
That didn't work. The next command might give you a reason why.
> count.fields("NHage.txt")
When you read the data from the Census Bureau they mentioned something about tab-delimited format. That was an important piece of information.
> age<-read.table("NHage.txt",sep="\t")
Almost! Yes, the "\t" denotes tab, we have a problem with the first line of the data set but it is easily rectified.
> age<-read.table("NHage.txt",sep="\t",header=T)
It finally worked on the last attempt, it wasn't that hard really.
What happened? (Hint: ?read.table) See HELP
for other ways of getting data into Splus.
> attributes(age)
> age[5,]
> age[,4]
> age["New Haven town",]
> age[,"P0130001"]
> age$P0130001
As you can see the data frame behaves like a matrix and like a list. This can be very useful.
Save the first five rows of the data frame as age5. Use age5 as the test case for the first problem. It is a good idea to experiment with small data sets whenever you are trying to get a function to work.
Try dim(age), dimnames(age). Try running attributes() on each column of the data frame. (You can select columns by name or number, as for matrices, or by using the $ notation for lists.) Try
> lapply(age,attributes)
Yes, the command found the attributes of each column of the data frame What do all those attributes mean?
Build yourself a function that takes a data frame as argument, and carries out the following operations. Assume that you will be feeding in a data frame whose rows have labels like "Ansonia town", ... When building the function we recommend that you start writing a function that does step 1, then edit it to do steps 1 and 2, etc.
Important HELP
concerning saving your work.
> x <- 3
> foo<-function(y){ x<- 15; y + x }
> foo(4)
[1] 19
> foo(x)
[1] 18
> x
[1] 3
Explain what happens. Pay attention to the Evaluation Frames section, for more details.
I assume you still have the data frame age.split lying around. Try
> search()
This function returns the list of directories, in order, where Splus looks
for objects. It is called the "search path".
> objects()
> attach(age.split,1)
This puts the data frame age
in the first position on the search list
> total = under18 + over18
> search()
> objects()
> detach(1,save="junk")
> objects()
> junk
Explain what happened.
Note: If you create an object called foo in your working directory, and if there is another object called foo further down in the search list, Splus will find your foo first. Your foo object masks the other foo object. Be careful that you don't accidentally mask a system object, such as the t() function or the c() function. You should take heed of any warnings about masking. See ?masked.
We have created a special library for Stat200 at
H:\\courses\\stat200
The Splus command
library(lib.loc="h:\\classes\\stat200")
gives a list of all the sections in the stat200 library, and also all the sections in the default system library. You will see that one of the available sections is called "nci". Nothing loaded yet.
If you want to find out about the nci section of the library, type
library(help=nci,lib.loc="h:\\classes\\stat200")
If you want to attach the nci section of the library to your search path, type:
library(nci,lib.loc="h:\\classes\\stat200")
Then Splus has access to all the data in the nci library. (Try search() after you attach the library.)