[英]Expand list column of data.tables
I have a data.table
with a list column, where each element is a data.table
: 我有一个带有列表列的
data.table
,其中每个元素都是data.table
:
dt <- data.table(id = c(1, 1, 2),
var = list(data.table(a = c(1, 2), b = c(3, 4)),
data.table(a = c(5, 6), b = c(7, 8)),
data.table(a = 9, b = 10)))
dt
# id var
# 1: 1 <data.table>
# 2: 1 <data.table>
# 3: 2 <data.table>
Now I want to "unlist" this structure to: 现在我想将这个结构“取消列出”:
a b id
1: 1 3 1
2: 2 4 1
3: 5 7 1
4: 6 8 1
5: 9 10 2
I know how to expand the embedded data.table
part with rbindlist
, but just have no idea how to bind the flattened data.table
with variable "id". 我知道如何使用
rbindlist
扩展嵌入的data.table
部分,但是不知道如何使用变量“id”绑定展平的data.table
。
The original dataset is 30 million lines and with dozens of variables, so I would really appreciate if you would propose solution not only workable but also memory efficient. 原始数据集是3000万行并且有几十个变量,所以如果您提出的解决方案不仅可行而且内存效率高,我将非常感激。
In this case dt[, var[[1]], by=id]
works. 在这种情况下,
dt[, var[[1]], by=id]
起作用。 However, I use rbindlist
as the OP mentioned: 但是,我使用
rbindlist
作为提到的OP:
dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]
Then merge on r
(rows of dt
) if you really need any vars from there: 然后在
r
( dt
行)上合并,如果你真的需要任何vars:
res[dt, on=.(r), `:=`(id = i.id)]
This is better than dt[, var[[1]], by=id]
in a few ways: 这在几个方面优于
dt[, var[[1]], by=id]
:
rbindlist
should be faster than something with a lot of by=
groups. rbindlist
应该比有很多by=
groups的东西快。 dt
, all of them will have to end up in by=
. dt
有更多变量,则所有变量都必须以by=
结尾。 dt
at all, since they can always be grabbed from that table later and they take up a lot less memory there. dt
中携带vars,因为它们总是可以从那个表中获取,并且它们在那里占用的内存要少得多。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.