The read.table() function

The read.table() function is intended for use with text files where each line consists of the same number of fields, separated by a fixed "sep" character (such as a colon, or a tab, or just white space).

Typically you will be saving data sets from the WWW to a machine in the Statlab. Make sure you save as a text file. It is best to save in the directory c:\users, because Splus looks in that directory automatically.

For example, suppose you have saved a file called pop.txt in C:\user, and that the file contains fields separated by a tab character. Then you would type

mydata <- read.table("pop.txt",sep="\t")
to read the data into a dataframe that you call mydata.

If the dataframe is not successfully created you might get messages about the file not being found (make sure it is in c:\user, or give an explicit path name if you have saved it someplace else), or about a variable number of fields (try count.fields, or look at the file with an editor to verify that all lines have the same number of fields--the read.table function needs the same number of fields per line). A common source of error is an incorrect specification of a field separator.

Often the first line of a file is not data, but instead consists of a header line containg the names for the variables. If no header is present, Splus makes up names like V1, V2, ... If there is a header line, you need to read the data in with a command like

mydata <- read.table("pop.txt",sep="\t",header=T)
Always look at the first few rows of the dataframe after a read.table, to make sure that you haven't accidently converted the first row of data to a very weird header, or that you haven't tried to force a header into data.

The scan() function

For masochists, or those with data that is not neatly arranged in a file, try scan() I find myself using scan() quite a lot if read.table() either is too slow or it creates factors where I don't want them. Scan takes more work, but it gives more control. The read.table() function is actually built around scan(). Have a look at the code for read.table sometime.

For some very large data sets, Splus can crash with read.table(), but scan() might works.