简体   繁体   中英

Reading PISA data into R - read.table error

I am trying to read data from the PISA 2012 study ( http://pisa2012.acer.edu.au/downloads.php ) into R using the read.table function. This is the code I tried:

pisa  <- read.table("pisa2012.txt", sep = "")    

unfortunately I keep getting the following error message:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
: line 2 did not have 184 elements    

I have tried to set

header = T

but then get the following error message

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
 :line 1 did not have 184 elements

Lastly, this is what the .txt file looks like ...

http://postimg.org/image/4u9lqtxqd/

Thanks for your help!

You can see from the first line that you'll need some sort of control file to delimit the individual variables. So, from working with PISA in other environments, I know the first three columns corrrespond to the ISO 3 letter country code (eg, ALB). What follows are numbers and letters that need to be made sense of in a meaninful way by separating them. You could use the codebook for this ( https://pisa2012.acer.edu.au/downloads/M_stu_codebook.pdf ), but that is a real bear for every single variable. Why not download in SPSS or sAS and import? Not a 'slick' solution, but without a control file, you'd have a lot of manual work to do.

I just read the files using readr package. So what will you need: readr package, the TXT file, SAScii package and the associated sas file.

So, let say you want to read the student files. Then you will need the following files: INT_STU12_DEC03.txt and INT_STU12_DEC03.sas.

##################### READING STUDENT DATA  ###################
## Loading the dictionary
dic_student = parse.SAScii(sas_ri = 'INT_STU12_SAS.sas')

## Creating the positions to read_fwf
student <- read_fwf(file = 'INT_STU12_DEC03.txt', col_positions = fwf_widths(dic_student$width), progress = T)
colnames(student) <- dic_student$varname

OBS 1: As i'm using Linux, I needed to delete the first lines from the sas file and change the encoding to UTF-8.

OBS 2: The lines deleted, were:

libname  M_DEC03 "C:\XXX"; 
filename STU "C:\XXX\INT_STU12_DEC03.txt"; 
options nofmterr;

OBS 3: The dataset takes about 1Gb, so you will need enougth RAM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM