简体   繁体   中英

Any more concise `data.table` way to unnest a nested column in data.table?

Assuming we have a data.table with a nested column val

dt <- data.table(
  grp = c(1, 2, 1, 3, 4),
  val = list("a", c("b", "c"), c("d", "e", "f"), "g", c("h", "i"))
)

which shows as

> dt
   grp   val
1:   1     a
2:   2   b,c
3:   1 d,e,f
4:   3     g
5:   4   h,i

Question

I would like to unnest the val column, where a possible option is using tidyr::unnest , ie,

> dt %>%
+   unnest(val)
# A tibble: 9 × 2
    grp val
  <dbl> <chr>
1     1 a
2     2 b
3     2 c
4     1 d
5     1 e
6     1 f
7     3 g
8     4 h
9     4 i

I am wondering if we can implement it by using the data.table only .

Watch out the order of values in the column grp , I would like to preserve the order like 1,2,1,3,4 rather than 1,1,2,3,4 .


My data.table Attempt

My attempt is as below

> dt[, id := .I][, lapply(.SD, unlist), id][, id := NULL][]
   grp val
1:   1   a
2:   2   b
3:   2   c
4:   1   d
5:   1   e
6:   1   f
7:   3   g
8:   4   h
9:   4   i

or

> dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))]
   grp val
1:   1   a
2:   2   b
3:   2   c
4:   1   d
5:   1   e
6:   1   f
7:   3   g
8:   4   h
9:   4   i

but I guess there might be some more concise and elegant way to do this, eg, without creating an auxiliary column id or using rep + lengths .

Any idea? Much appreciated!

One more option:

dt[, .(grp, val = unlist(val)), by = .I][, !"I"]
#      grp    val
#    <num> <char>
# 1:     1      a
# 2:     2      b
# 3:     2      c
# 4:     1      d
# 5:     1      e
# 6:     1      f
# 7:     3      g
# 8:     4      h
# 9:     4      i

PS. If you are using data.table 1.4.2 or older by =.I won't work. Instead you can use:

dt[, .(grp, val = unlist(val)), by = 1:nrow(dt)][, !"nrow"]

PS2. I think your dt[,.(grp = rep(grp,lengths(val)), val = unlist(val))] is neater.

I guess this works:

dt[, lapply(.SD, unlist), by = .(grp)]

   grp val
1:   1   a
2:   1   d
3:   1   e
4:   1   f
5:   2   b
6:   2   c
7:   3   g
8:   4   h
9:   4   i

Another option without group by like this:

library(data.table)
dt[rep(1:.N, lengths(val))][, val:=unlist(dt$val)][]
#>    grp val
#> 1:   1   a
#> 2:   2   b
#> 3:   2   c
#> 4:   1   d
#> 5:   1   e
#> 6:   1   f
#> 7:   3   g
#> 8:   4   h
#> 9:   4   i

Created on 2022-09-16 with reprex v2.0.2

This should work:

dt <- dt[, .(grp, unlist(val)), by=.(rleid(grp))][, rleid := NULL]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM