简体   繁体   English

展开data.tables的列表列

[英]Expand list column of data.tables

I have a data.table with a list column, where each element is a data.table : 我有一个带有列表列的data.table ,其中每个元素都是data.table

dt <- data.table(id = c(1, 1, 2),
                 var = list(data.table(a = c(1, 2), b = c(3, 4)),
                            data.table(a = c(5, 6), b = c(7, 8)),
                            data.table(a = 9, b = 10)))

dt
# id             var
# 1:  1 <data.table>
# 2:  1 <data.table>
# 3:  2 <data.table>

Now I want to "unlist" this structure to: 现在我想将这个结构“取消列出”:

   a  b id
1: 1  3  1
2: 2  4  1
3: 5  7  1
4: 6  8  1
5: 9 10  2

I know how to expand the embedded data.table part with rbindlist , but just have no idea how to bind the flattened data.table with variable "id". 我知道如何使用rbindlist扩展嵌入的data.table部分,但是不知道如何使用变量“id”绑定展平的data.table

The original dataset is 30 million lines and with dozens of variables, so I would really appreciate if you would propose solution not only workable but also memory efficient. 原始数据集是3000万行并且有几十个变量,所以如果您提出的解决方案不仅可行而且内存效率高,我将非常感激。

In this case dt[, var[[1]], by=id] works. 在这种情况下, dt[, var[[1]], by=id]起作用。 However, I use rbindlist as the OP mentioned: 但是,我使用rbindlist作为提到的OP:

dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]

Then merge on r (rows of dt ) if you really need any vars from there: 然后在rdt行)上合并,如果你真的需要任何vars:

res[dt, on=.(r), `:=`(id = i.id)]

This is better than dt[, var[[1]], by=id] in a few ways: 这在几个方面优于dt[, var[[1]], by=id]

  • rbindlist should be faster than something with a lot of by= groups. rbindlist应该比有很多by= groups的东西快。
  • If there are more vars in dt , all of them will have to end up in by= . 如果dt有更多变量,则所有变量都必须以by=结尾。
  • Probably, it is not necessary to carry over vars from dt at all, since they can always be grabbed from that table later and they take up a lot less memory there. 可能根本没有必要从dt中携带vars,因为它们总是可以从那个表中获取,并且它们在那里占用的内存要少得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM