简体   繁体   中英

How to avoid quotes in CSV import using R

I am having problems reading the csv-file below (extract) using R:

id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"

df <- read_csv("don.csv", quote = "")

gives me qoutes in cells, which I can process afterwards, but can it not be done more smoothly during importing?

1) If there are no quotes in the input other than ones we don't want then this would work. If the input is coming from a file replace textConnection(Lines) with "don.csv" .

L <- readLines(textConnection(Lines))
read.csv(text = gsub('"', '', L))

giving:

  id        created_date stars charity_id user_id is_anonymous user_country_id
1  1 2016-08-10 12:50:30   100      65536   32772         NULL             110
2 65 2016-11-09 07:57:32    50     425986 2686978            1             110
3 66 2016-11-09 08:07:51    50     393217  753673            0             110

2) Also assuming that double quotes are all unwanted, another possibility is:

read.csv(pipe("sed 's/\042//g' don.csv"))

On Windows you will need to have Rtools installed and on your path for this to work or, if not on your path give the full path, eg "C:\\\\Rtools\\\\bin\\\\sed" .

Note

The input, Lines is:

Lines <-
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'

You can use:

d <- read.table(sep='"', skip=1, text=
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
)
d2 <- read.table(text=paste0(d$V2, d$V6), sep=",")
# or d2 <- read.table(text=paste0(d$V2, d$V6), sep=",", na.strings = "NULL")

(For your file you have to use file="don.csv" instead of my text=... .)
The result is

# d
#   V1  V2 V3                  V4 V5                        V6 V7
# 1 NA  1, NA 2016-08-10 12:50:30 NA ,100,65536,32772,NULL,110 NA
# 2 NA 65, NA 2016-11-09 07:57:32 NA  ,50,425986,2686978,1,110 NA
# 3 NA 66, NA 2016-11-09 08:07:51 NA   ,50,393217,753673,0,110 NA
# d2
#   V1 V2  V3     V4      V5   V6  V7
# 1  1 NA 100  65536   32772 NULL 110
# 2 65 NA  50 425986 2686978    1 110
# 3 66 NA  50 393217  753673    0 110

Eventually you want to rename the columns and bind the columns together with cbind()
The names of the columns you can get with:

cnames <- read.table(sep=',', nrows=1, text=
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
)
as.character(unlist(cnames[1,]))

(For your file you have to use file="don.csv" instead of my text=... .)

The complete code for your file:

cnames <- read.table(sep=',', nrows=1, file="don.csv")
H <- as.character(unlist(cnames[1,]))

d <- read.table(sep='"', skip=1, file="don.csv")
d2 <- read.table(text=paste0(d$V2, d$V6), sep=",", na.strings = "NULL")
d.d2 <- cbind(d[, 4], d2[, -2])
names(d.d2) <- H[c(2, 1, 3:7)]
d.d2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM