简体   繁体   English

如何在使用R的CSV导入中避免引号

[英]How to avoid quotes in CSV import using R

I am having problems reading the csv-file below (extract) using R: 我在使用R读取(提取)以下的csv文件时遇到问题:

id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"

df <- read_csv("don.csv", quote = "")

gives me qoutes in cells, which I can process afterwards, but can it not be done more smoothly during importing? 可以在单元格中给我qoutes,我以后可以处理,但是在导入过程中不能更顺利地完成吗?

1) If there are no quotes in the input other than ones we don't want then this would work. 1)如果输入中除了我们不希望的引号之外没有其他引号,则可以使用。 If the input is coming from a file replace textConnection(Lines) with "don.csv" . 如果输入来自文件, textConnection(Lines)替换为"don.csv"

L <- readLines(textConnection(Lines))
read.csv(text = gsub('"', '', L))

giving: 赠送:

  id        created_date stars charity_id user_id is_anonymous user_country_id
1  1 2016-08-10 12:50:30   100      65536   32772         NULL             110
2 65 2016-11-09 07:57:32    50     425986 2686978            1             110
3 66 2016-11-09 08:07:51    50     393217  753673            0             110

2) Also assuming that double quotes are all unwanted, another possibility is: 2)假设双引号都是不需要的,另一种可能性是:

read.csv(pipe("sed 's/\042//g' don.csv"))

On Windows you will need to have Rtools installed and on your path for this to work or, if not on your path give the full path, eg "C:\\\\Rtools\\\\bin\\\\sed" . 在Windows上,您需要安装Rtools并在其路径上运行此工具,否则,请提供完整路径,例如"C:\\\\Rtools\\\\bin\\\\sed"

Note 注意

The input, Lines is: 输入Lines是:

Lines <-
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'

You can use: 您可以使用:

d <- read.table(sep='"', skip=1, text=
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
)
d2 <- read.table(text=paste0(d$V2, d$V6), sep=",")
# or d2 <- read.table(text=paste0(d$V2, d$V6), sep=",", na.strings = "NULL")

(For your file you have to use file="don.csv" instead of my text=... .) (对于您的文件,您必须使用file="don.csv"而不是我的text=...
The result is 结果是

# d
#   V1  V2 V3                  V4 V5                        V6 V7
# 1 NA  1, NA 2016-08-10 12:50:30 NA ,100,65536,32772,NULL,110 NA
# 2 NA 65, NA 2016-11-09 07:57:32 NA  ,50,425986,2686978,1,110 NA
# 3 NA 66, NA 2016-11-09 08:07:51 NA   ,50,393217,753673,0,110 NA
# d2
#   V1 V2  V3     V4      V5   V6  V7
# 1  1 NA 100  65536   32772 NULL 110
# 2 65 NA  50 425986 2686978    1 110
# 3 66 NA  50 393217  753673    0 110

Eventually you want to rename the columns and bind the columns together with cbind() 最终,您想重命名列并将列与cbind()绑定在一起
The names of the columns you can get with: 您可以使用的列名称:

cnames <- read.table(sep=',', nrows=1, text=
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
)
as.character(unlist(cnames[1,]))

(For your file you have to use file="don.csv" instead of my text=... .) (对于您的文件,您必须使用file="don.csv"而不是我的text=...

The complete code for your file: 文件的完整代码:

cnames <- read.table(sep=',', nrows=1, file="don.csv")
H <- as.character(unlist(cnames[1,]))

d <- read.table(sep='"', skip=1, file="don.csv")
d2 <- read.table(text=paste0(d$V2, d$V6), sep=",", na.strings = "NULL")
d.d2 <- cbind(d[, 4], d2[, -2])
names(d.d2) <- H[c(2, 1, 3:7)]
d.d2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM