简体   繁体   中英

Error Reading a CSV File in R

I am trying to read a bunch of files from http://www.ercot.com/gridinfo/load/load_hist , all the files are read properly with read.csv except for the last one, the file for 2017. When I attempt to read the file with read.csv I get the following error:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'a real', got '"8'

However, I have checked with Excel and there is not "8 or 8 value in the file. The error message seems to be clear, but I can't find the "8 or 8 and I have the same issue even if I read 0 rows (with the nrows argument of the read.csv function).

 hold2  <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)))

hold2  <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)), nrows=0)

Also, in the last row of the file there are values that do not respect the format in the rest of the file. I would like to skip the last line, but there are no argument in the read.csv function to do this. Is there any work around? I am thinking or using something like:

hold2  <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)), nrows=nrow(read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""))-1))

Any thoughts on how to best to this? Thanks

Using the readr package

> df <- readr::read_csv("~/Desktop/native_load_2017.csv")
Parsed with column specification: 
cols(   
`Hour Ending` = col_character(),
 COAST = col_number(),
 EAST = col_number(),
 FWEST = col_number(),
 NORTH = col_number(),
 NCENT = col_number(),
 SOUTH = col_number(),
 SCENT = col_character(),
 WEST = col_number(),
 ERCOT = col_number()
)
>

can see the SCENT column is being parsed as character (due to the difference in format of values in the last row that you noted). Below, specifying the first column as character and the default as col_number() reads the file (to note: col_number() handles the commas and decimal points present in the columns you had as double).

options(digits=7)
df <- readr::read_csv("~/Desktop/native_load_2017.csv", col_types = cols(
  `Hour Ending` = col_character(),
  .default = col_number())
)
sapply(df, class) 
#df[complete.cases(df),] # to remove the last row if needed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM