[英]How to avoid quotes in CSV import using R
I am having problems reading the csv-file below (extract) using R: 我在使用R读取(提取)以下的csv文件时遇到问题:
id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"
df <- read_csv("don.csv", quote = "")
gives me qoutes in cells, which I can process afterwards, but can it not be done more smoothly during importing? 可以在单元格中给我qoutes,我以后可以处理,但是在导入过程中不能更顺利地完成吗?
1) If there are no quotes in the input other than ones we don't want then this would work. 1)如果输入中除了我们不希望的引号之外没有其他引号,则可以使用。 If the input is coming from a file replace textConnection(Lines)
with "don.csv"
. 如果输入来自文件, textConnection(Lines)
替换为"don.csv"
。
L <- readLines(textConnection(Lines))
read.csv(text = gsub('"', '', L))
giving: 赠送:
id created_date stars charity_id user_id is_anonymous user_country_id
1 1 2016-08-10 12:50:30 100 65536 32772 NULL 110
2 65 2016-11-09 07:57:32 50 425986 2686978 1 110
3 66 2016-11-09 08:07:51 50 393217 753673 0 110
2) Also assuming that double quotes are all unwanted, another possibility is: 2)假设双引号都是不需要的,另一种可能性是:
read.csv(pipe("sed 's/\042//g' don.csv"))
On Windows you will need to have Rtools installed and on your path for this to work or, if not on your path give the full path, eg "C:\\\\Rtools\\\\bin\\\\sed"
. 在Windows上,您需要安装Rtools并在其路径上运行此工具,否则,请提供完整路径,例如"C:\\\\Rtools\\\\bin\\\\sed"
。
The input, Lines
is: 输入Lines
是:
Lines <-
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
You can use: 您可以使用:
d <- read.table(sep='"', skip=1, text=
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
)
d2 <- read.table(text=paste0(d$V2, d$V6), sep=",")
# or d2 <- read.table(text=paste0(d$V2, d$V6), sep=",", na.strings = "NULL")
(For your file you have to use file="don.csv"
instead of my text=...
.) (对于您的文件,您必须使用file="don.csv"
而不是我的text=...
)
The result is 结果是
# d
# V1 V2 V3 V4 V5 V6 V7
# 1 NA 1, NA 2016-08-10 12:50:30 NA ,100,65536,32772,NULL,110 NA
# 2 NA 65, NA 2016-11-09 07:57:32 NA ,50,425986,2686978,1,110 NA
# 3 NA 66, NA 2016-11-09 08:07:51 NA ,50,393217,753673,0,110 NA
# d2
# V1 V2 V3 V4 V5 V6 V7
# 1 1 NA 100 65536 32772 NULL 110
# 2 65 NA 50 425986 2686978 1 110
# 3 66 NA 50 393217 753673 0 110
Eventually you want to rename the columns and bind the columns together with cbind()
最终,您想重命名列并将列与cbind()
绑定在一起
The names of the columns you can get with: 您可以使用的列名称:
cnames <- read.table(sep=',', nrows=1, text=
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
)
as.character(unlist(cnames[1,]))
(For your file you have to use file="don.csv"
instead of my text=...
.) (对于您的文件,您必须使用file="don.csv"
而不是我的text=...
)
cnames <- read.table(sep=',', nrows=1, file="don.csv")
H <- as.character(unlist(cnames[1,]))
d <- read.table(sep='"', skip=1, file="don.csv")
d2 <- read.table(text=paste0(d$V2, d$V6), sep=",", na.strings = "NULL")
d.d2 <- cbind(d[, 4], d2[, -2])
names(d.d2) <- H[c(2, 1, 3:7)]
d.d2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.