简体   繁体   中英

write and read.csv different number of columns

A strange problem with write and read.csv . I have ways to work around this but would be great if someone can identify what is going on.

I have code from someone else which dynamically creates a series of CSVs by appending new rows. The problem is that read.csv appears to read the newly created csv inconsistently.

Dummy code example:

datfile <- "E:/temp.csv"
write(paste("Name","tempname",sep=","),datfile,1)
write(paste("VShort",50,sep=","),datfile,1,append=T)
write(paste("Short1",1,1,sep=","),datfile,1,append=T)
write(paste("Short2",0,2,sep=","),datfile,1,append=T)
write(paste("Short3",0,2,sep=","),datfile,1,append=T)
write(paste("Long",0,0.3,0.6,1,sep=","),datfile,1,append=T)
write(paste("Short4",2,0,sep=","),datfile,1,append=T)
read.csv(datfile,header=F,colClasses="character")

Seven rows of data written to CSV, but read.csv reads in 8 rows ( Long is split over two rows). Eight rows and three columns read in.

The problem is fixed by opening temp.csv in Excel and saving. Then read.csv reads in the 7 lines appropriately.

The problem only appears to exist under certain conditions. For example, remove Short 3 and there is no problem:

datfile2 <- "E:/temp2.csv"
write(paste("Name","tempname",sep=","),datfile2,1)
write(paste("VShort",50,sep=","),datfile2,1,append=T)
write(paste("Short1",1,1,sep=","),datfile2,1,append=T)
write(paste("Short2",0,2,sep=","),datfile2,1,append=T)
write(paste("Long",0,0.3,0.6,1,sep=","),datfile2,1,append=T)
write(paste("Short4",2,0,sep=","),datfile2,1,append=T)
read.csv(datfile2,header=F,colClasses="character") 

Six rows and five columns are read in.

Any ideas what is going on here?

R version 3.2.4 Revised

Windows 10

This is probably related to the following in ?read.csv :

The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the 'Examples').

It just happens that the row with the most number of columns is the sixth row in your first example.

I suggest using col.names to get around this, eg:

`... read.csv(..., col.names = paste0('V', 1:6))`

As the OP notes in a comment to this answer, you can find out the number of columns required using readLines :

Ncol <- max(unlist(lapply(strsplit(readLines(datfile), ","), length)))

and then modify the above to give:

read.csv(datfile,header=F,colClasses="character", col.names=paste0("V", 1:Ncol))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM