展开data.tables的列表列

Question

I have a data.table with a list column, where each element is a data.table : 我有一个带有列表列的data.table ，其中每个元素都是data.table ：

dt <- data.table(id = c(1, 1, 2),
                 var = list(data.table(a = c(1, 2), b = c(3, 4)),
                            data.table(a = c(5, 6), b = c(7, 8)),
                            data.table(a = 9, b = 10)))

dt
# id             var
# 1:  1 <data.table>
# 2:  1 <data.table>
# 3:  2 <data.table>

Now I want to "unlist" this structure to: 现在我想将这个结构“取消列出”：

I know how to expand the embedded data.table part with rbindlist , but just have no idea how to bind the flattened data.table with variable "id". 我知道如何使用rbindlist扩展嵌入的data.table部分，但是不知道如何使用变量“id”绑定展平的data.table 。

The original dataset is 30 million lines and with dozens of variables, so I would really appreciate if you would propose solution not only workable but also memory efficient. 原始数据集是3000万行并且有几十个变量，所以如果您提出的解决方案不仅可行而且内存效率高，我将非常感激。

Answer 1

In this case dt[, var[[1]], by=id] works. 在这种情况下， dt[, var[[1]], by=id]起作用。 However, I use rbindlist as the OP mentioned: 但是，我使用rbindlist作为提到的OP：

dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]

Then merge on r (rows of dt ) if you really need any vars from there: 然后在r （ dt行）上合并，如果你真的需要任何vars：

res[dt, on=.(r), `:=`(id = i.id)]

This is better than dt[, var[[1]], by=id] in a few ways: 这在几个方面优于dt[, var[[1]], by=id] ：

rbindlist should be faster than something with a lot of by= groups. rbindlist应该比有很多by= groups的东西快。
If there are more vars in dt , all of them will have to end up in by= . 如果dt有更多变量，则所有变量都必须以by=结尾。
Probably, it is not necessary to carry over vars from dt at all, since they can always be grabbed from that table later and they take up a lot less memory there. 可能根本没有必要从dt中携带vars，因为它们总是可以从那个表中获取，并且它们在那里占用的内存要少得多。

展开data.tables的列表列

问题描述

1 个解决方案

解决方案1
6 已采纳 2017-04-01 04:37:18

展开data.tables的列表列

问题描述

1 个解决方案

解决方案1 6 已采纳 2017-04-01 04:37:18

解决方案1
6 已采纳 2017-04-01 04:37:18