I have a dataset where different values are located in one column. I know it's not a good practice but that is beyond my control. An example dataset is as follows:
library(data.table)
a1 <- data.table(v1 = "a", v2 = "12,13,12,12,10")
a2 <- data.table(v1 = "b", v2 = "10,10,11,12")
a3 <- data.table(v1 = "b", v2 = "10,10,13,14,12")
DT <- rbindlist(list(a1, a2, a3))
I would like to create a new column with only the unique values in "b" from both rows. I have tried this:
DT[, v5 := paste(unlist(lapply(v2, function(x) unique(unlist(strsplit(as.character(x), ",", fixed = TRUE))))), collapse = ","), by = v1]
But it only exclude duplicated values in each row. What I got is:
v1 v2 v5
1: a 12,13,12,12,10 12,13,10
2: b 10,10,11,12 10,11,12,10,13,14,12
3: b 10,10,13,14,12 10,11,12,10,13,14,12
The values that I hope to get in column "v5" for rows "b" are 10,11,12,13,14.
I appreciate it very much for guidance to solve the problem.
DT[DT[,toString(unique(scan(text = v2,sep = ","))),by=v1],on="v1"]
Read 5 items
Read 9 items
v1 v2 V1
1: a 12,13,12,12,10 12, 13, 10
2: b 10,10,11,12 10, 11, 12, 13, 14
3: b 10,10,13,14,12 10, 11, 12, 13, 14
You can include quiet=T
so as not to print how many items read:
DT[DT[,toString(unique(scan(text = v2,sep = ",",quiet = T))),by=v1],on="v1"]
v1 v2 V1
1: a 12,13,12,12,10 12, 13, 10
2: b 10,10,11,12 10, 11, 12, 13, 14
3: b 10,10,13,14,12 10, 11, 12, 13, 14
DT[DT[,toString(unique(unlist(strsplit(v2,",")))),by=v1],on="v1"]
v1 v2 V1
1: a 12,13,12,12,10 12, 13, 10
2: b 10,10,11,12 10, 11, 12, 13, 14
3: b 10,10,13,14,12 10, 11, 12, 13, 14
Using paste
and unlist
DT[DT[,.(V5=paste(unique(unlist(strsplit(v2,","))),collapse=",")),by=v1],on="v1"]
v1 v2 V5
1: a 12,13,12,12,10 12,13,10
2: b 10,10,11,12 10,11,12,13,14
3: b 10,10,13,14,12 10,11,12,13,14
You are pretty close to solution. You must summarize ( paste
with collapse
) for a group before apply unique
.
You can try to summarise by v1
as:
DT[, .(v5 = paste(unique(unlist(strsplit(paste(v2,collapse = ","),
split = ","))),collapse=",")), by = v1]
# v1 v5
# 1: a 12,13,10
# 2: b 10,11,12,13,14
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.