I am having problems reading the csv-file below (extract) using R:
id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"
df <- read_csv("don.csv", quote = "")
gives me qoutes in cells, which I can process afterwards, but can it not be done more smoothly during importing?
1) If there are no quotes in the input other than ones we don't want then this would work. If the input is coming from a file replace textConnection(Lines)
with "don.csv"
.
L <- readLines(textConnection(Lines))
read.csv(text = gsub('"', '', L))
giving:
id created_date stars charity_id user_id is_anonymous user_country_id
1 1 2016-08-10 12:50:30 100 65536 32772 NULL 110
2 65 2016-11-09 07:57:32 50 425986 2686978 1 110
3 66 2016-11-09 08:07:51 50 393217 753673 0 110
2) Also assuming that double quotes are all unwanted, another possibility is:
read.csv(pipe("sed 's/\042//g' don.csv"))
On Windows you will need to have Rtools installed and on your path for this to work or, if not on your path give the full path, eg "C:\\\\Rtools\\\\bin\\\\sed"
.
The input, Lines
is:
Lines <-
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
You can use:
d <- read.table(sep='"', skip=1, text=
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
)
d2 <- read.table(text=paste0(d$V2, d$V6), sep=",")
# or d2 <- read.table(text=paste0(d$V2, d$V6), sep=",", na.strings = "NULL")
(For your file you have to use file="don.csv"
instead of my text=...
.)
The result is
# d
# V1 V2 V3 V4 V5 V6 V7
# 1 NA 1, NA 2016-08-10 12:50:30 NA ,100,65536,32772,NULL,110 NA
# 2 NA 65, NA 2016-11-09 07:57:32 NA ,50,425986,2686978,1,110 NA
# 3 NA 66, NA 2016-11-09 08:07:51 NA ,50,393217,753673,0,110 NA
# d2
# V1 V2 V3 V4 V5 V6 V7
# 1 1 NA 100 65536 32772 NULL 110
# 2 65 NA 50 425986 2686978 1 110
# 3 66 NA 50 393217 753673 0 110
Eventually you want to rename the columns and bind the columns together with cbind()
The names of the columns you can get with:
cnames <- read.table(sep=',', nrows=1, text=
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
)
as.character(unlist(cnames[1,]))
(For your file you have to use file="don.csv"
instead of my text=...
.)
cnames <- read.table(sep=',', nrows=1, file="don.csv")
H <- as.character(unlist(cnames[1,]))
d <- read.table(sep='"', skip=1, file="don.csv")
d2 <- read.table(text=paste0(d$V2, d$V6), sep=",", na.strings = "NULL")
d.d2 <- cbind(d[, 4], d2[, -2])
names(d.d2) <- H[c(2, 1, 3:7)]
d.d2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.