[英]Error in reading a CSV file with read.table()
在R中加載CSV數據集時遇到問題。該數據集可以從
https://data.baltimorecity.gov/City-Government/Baltimore-City-Employee-Salaries-FY2015/nsfe-bg53
我使用read.csv
如下導入數據,並且數據集已正確導入。
EmpSal <- read.csv('E:/Data/EmpSalaries.csv')
我嘗試使用read.table
讀取數據,並且在查看數據集時存在很多異常。
EmpSal1 <- read.table('E:/Data/EmpSalaries.csv',sep=',',header = T,fill = T)
上面的代碼開始從第7行讀取數據,數據集實際上包含約14K行,但僅導入了5K行。 在極少數情況下查看數據集時,會將15-20行合並為單行,而整個行數據將顯示在單列中。
我可以使用read.csv
數據集,但我read.csv
知道為什么它不適用於read.table的原因。
read.csv定義為:
function (file, header = TRUE, sep = ",", quote = "\"", dec = ".",
fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
您需要添加quote="\\""
(默認情況下, read.table
需要單引號,而read.csv
需要雙引號)
EmpSal <- read.csv('Baltimore_City_Employee_Salaries_FY2015.csv')
EmpSal1 <- read.table('Baltimore_City_Employee_Salaries_FY2015.csv', sep=',', header = TRUE, fill = TRUE, quote="\"")
identical(EmpSal, EmpSal1)
# TRUE
如前所述,使用read.csv()
命令成功導入了數據,而沒有提及quote參數。 read.csv函數的quote參數的默認值為"\\""
,而read.table函數的默認值為"\\"'"
。 檢查以下代碼,
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
row.names, col.names, as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
read.csv(file, header = TRUE, sep = ",", quote = "\"",
dec = ".", fill = TRUE, comment.char = "", ...)
您指定的數據中有很多單引號。 這就是為什么read.table函數對您不起作用的原因。
嘗試以下代碼,它將為您工作。
r<-read.table('/home/workspace/Downloads/Baltimore_City_Employee_Salaries_FY2015.csv',sep=",",quote="\"",header=T,fill=T)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.