A strange problem with write
and read.csv
. I have ways to work around this but would be great if someone can identify what is going on.
I have code from someone else which dynamically creates a series of CSVs by appending new rows. The problem is that read.csv
appears to read the newly created csv inconsistently.
Dummy code example:
datfile <- "E:/temp.csv"
write(paste("Name","tempname",sep=","),datfile,1)
write(paste("VShort",50,sep=","),datfile,1,append=T)
write(paste("Short1",1,1,sep=","),datfile,1,append=T)
write(paste("Short2",0,2,sep=","),datfile,1,append=T)
write(paste("Short3",0,2,sep=","),datfile,1,append=T)
write(paste("Long",0,0.3,0.6,1,sep=","),datfile,1,append=T)
write(paste("Short4",2,0,sep=","),datfile,1,append=T)
read.csv(datfile,header=F,colClasses="character")
Seven rows of data written to CSV, but read.csv
reads in 8 rows ( Long
is split over two rows). Eight rows and three columns read in.
The problem is fixed by opening temp.csv in Excel and saving. Then read.csv
reads in the 7 lines appropriately.
The problem only appears to exist under certain conditions. For example, remove Short 3
and there is no problem:
datfile2 <- "E:/temp2.csv"
write(paste("Name","tempname",sep=","),datfile2,1)
write(paste("VShort",50,sep=","),datfile2,1,append=T)
write(paste("Short1",1,1,sep=","),datfile2,1,append=T)
write(paste("Short2",0,2,sep=","),datfile2,1,append=T)
write(paste("Long",0,0.3,0.6,1,sep=","),datfile2,1,append=T)
write(paste("Short4",2,0,sep=","),datfile2,1,append=T)
read.csv(datfile2,header=F,colClasses="character")
Six rows and five columns are read in.
Any ideas what is going on here?
R version 3.2.4 Revised
Windows 10
This is probably related to the following in ?read.csv
:
The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the 'Examples').
It just happens that the row with the most number of columns is the sixth row in your first example.
I suggest using col.names
to get around this, eg:
`... read.csv(..., col.names = paste0('V', 1:6))`
As the OP notes in a comment to this answer, you can find out the number of columns required using readLines
:
Ncol <- max(unlist(lapply(strsplit(readLines(datfile), ","), length)))
and then modify the above to give:
read.csv(datfile,header=F,colClasses="character", col.names=paste0("V", 1:Ncol))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.