Reading PISA data into R - read.table error

Question

I am trying to read data from the PISA 2012 study ( http://pisa2012.acer.edu.au/downloads.php ) into R using the read.table function. This is the code I tried:

pisa  <- read.table("pisa2012.txt", sep = "")

unfortunately I keep getting the following error message:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
: line 2 did not have 184 elements

I have tried to set

header = T

but then get the following error message

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
 :line 1 did not have 184 elements

Lastly, this is what the .txt file looks like ...

http://postimg.org/image/4u9lqtxqd/

Thanks for your help!

Answer 1

You can see from the first line that you'll need some sort of control file to delimit the individual variables. So, from working with PISA in other environments, I know the first three columns corrrespond to the ISO 3 letter country code (eg, ALB). What follows are numbers and letters that need to be made sense of in a meaninful way by separating them. You could use the codebook for this ( https://pisa2012.acer.edu.au/downloads/M_stu_codebook.pdf ), but that is a real bear for every single variable. Why not download in SPSS or sAS and import? Not a 'slick' solution, but without a control file, you'd have a lot of manual work to do.

Answer 2

I just read the files using readr package. So what will you need: readr package, the TXT file, SAScii package and the associated sas file.

So, let say you want to read the student files. Then you will need the following files: INT_STU12_DEC03.txt and INT_STU12_DEC03.sas.

##################### READING STUDENT DATA  ###################
## Loading the dictionary
dic_student = parse.SAScii(sas_ri = 'INT_STU12_SAS.sas')

## Creating the positions to read_fwf
student <- read_fwf(file = 'INT_STU12_DEC03.txt', col_positions = fwf_widths(dic_student$width), progress = T)
colnames(student) <- dic_student$varname

OBS 1: As i'm using Linux, I needed to delete the first lines from the sas file and change the encoding to UTF-8.

OBS 2: The lines deleted, were:

libname  M_DEC03 "C:\XXX"; 
filename STU "C:\XXX\INT_STU12_DEC03.txt"; 
options nofmterr;

OBS 3: The dataset takes about 1Gb, so you will need enougth RAM.

Reading PISA data into R - read.table error

Question

2 answers

solution1
0 2015-10-07 09:40:16

solution2
0 2015-10-25 05:40:52

Reading PISA data into R - read.table error

Question

2 answers

solution1 0 2015-10-07 09:40:16

solution2 0 2015-10-25 05:40:52

solution1
0 2015-10-07 09:40:16

solution2
0 2015-10-25 05:40:52