简体   繁体   English

在R中读取CSV文件时出错

[英]Error Reading a CSV File in R

I am trying to read a bunch of files from http://www.ercot.com/gridinfo/load/load_hist , all the files are read properly with read.csv except for the last one, the file for 2017. When I attempt to read the file with read.csv I get the following error: 我正在尝试从http://www.ercot.com/gridinfo/load/load_hist读取一堆文件,所有文件都可以通过read.csv正确读取,最后一个文件为2017年。使用read.csv读取文件时, read.csv以下错误:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'a real', got '"8' 扫描错误(文件=文件,什么=什么,九月=九月,报价=报价,十月=十二月,:scan()预期为“真实”,得到“ 8”

However, I have checked with Excel and there is not "8 or 8 value in the file. The error message seems to be clear, but I can't find the "8 or 8 and I have the same issue even if I read 0 rows (with the nrows argument of the read.csv function). 但是,我已经使用Excel进行了检查,文件中没有"88值。错误消息似乎很明显,但是即使我读了0,也找不到"88并且我遇到了同样的问题行(带有read.csv函数的nrows参数)。

 hold2  <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)))

hold2  <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)), nrows=0)

Also, in the last row of the file there are values that do not respect the format in the rest of the file. 同样,在文件的最后一行中,有些值不遵守文件其余部分中的格式。 I would like to skip the last line, but there are no argument in the read.csv function to do this. 我想跳过最后一行,但是read.csv函数中没有参数可以执行此操作。 Is there any work around? 有什么解决办法吗? I am thinking or using something like: 我在想或使用类似的东西:

hold2  <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)), nrows=nrow(read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""))-1))

Any thoughts on how to best to this? 关于如何做到最好的任何想法? Thanks 谢谢

Using the readr package 使用阅读器包

> df <- readr::read_csv("~/Desktop/native_load_2017.csv")
Parsed with column specification: 
cols(   
`Hour Ending` = col_character(),
 COAST = col_number(),
 EAST = col_number(),
 FWEST = col_number(),
 NORTH = col_number(),
 NCENT = col_number(),
 SOUTH = col_number(),
 SCENT = col_character(),
 WEST = col_number(),
 ERCOT = col_number()
)
>

can see the SCENT column is being parsed as character (due to the difference in format of values in the last row that you noted). 可以看到SCENT列被解析为字符(由于您注意到的最后一行中值格式的差异)。 Below, specifying the first column as character and the default as col_number() reads the file (to note: col_number() handles the commas and decimal points present in the columns you had as double). 在下面,将第一列指定为字符,将默认列指定为col_number()读取文件(请注意:col_number()处理存在于双精度列中的逗号和小数点)。

options(digits=7)
df <- readr::read_csv("~/Desktop/native_load_2017.csv", col_types = cols(
  `Hour Ending` = col_character(),
  .default = col_number())
)
sapply(df, class) 
#df[complete.cases(df),] # to remove the last row if needed

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM