简体   繁体   中英

collapse rows in data.table

I have one data.table with 1M rows and 2 columns

Dummy data:

require(data.table)
ID <- c(1,2,3)
variable <- c("a,b","a,c","c,d")
dt <- data.table(ID,variable)
dt
> dt
ID variable
1      a,b
2      a,c
3      c,d

Now I want to collapse the column "variable" into different rows by "ID", just as the "melt" function in reshape2 or melt.data.table in data.table

Here's what I want:

ID variable
1  a
1  b
2  a
2  c
3  c
3  d

PS: Given the desired results, I know how to do the reverse step.

dt2 <- data.table(ID = c(1,1,2,2,3,3), variable = c("a","b","a","c","c","d"))
dt3 <- dt2[, list(variables = paste(variable, collapse = ",")), by = ID]

Any tips or suggestions?

Since strsplit is vectorised, and that's going to be the time consuming operation here, I'd avoid using it on each group. Instead, one could first split on the , on the entire column and then reconstruct the data.table as follows:

var = strsplit(dt$variable, ",", fixed=TRUE)
len = vapply(var, length, 0L)
ans = data.table(ID=rep(dt$ID, len), variable=unlist(var))

#    ID variable
# 1:  1        a
# 2:  1        b
# 3:  2        a
# 4:  2        c
# 5:  3        c
# 6:  3        d

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM