在R中将模拟结果存储为data.table

Question

I've to do plenty of simulation and it takes lot of time. 我必须进行大量的模拟，这需要很多时间。 I think processing time can be reduced through data.table . 我认为可以通过data.table减少处理时间。 How can I store the results of mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2) into data.table without saving its output first to data.frame . 我如何将mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2)到data.table而不先保存其输出到data.frame 。

library(plyr)
df1 <- mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2)
library(data.table)
dt1 <- data.table(df1)

Edited 已编辑

I know that I can use setDT(df1) to avoid the creating to dt1 . 我知道我可以使用setDT(df1)避免创建到dt1 。 However, the main problem is about mdply which creates a data.frame which consumes a lot of time. 但是，主要问题是关于mdply创建一个data.frame ，它消耗大量时间。

Answer 1

plyr and data.table are for very similar purposes, so you usually don't need to switch back and forth between the two at all. plyr和data.table的用途非常相似，因此您通常根本不需要在两者之间来回切换。 You can do everything with data.table in this case: 在这种情况下，您可以使用data.table进行所有data.table ：

dt = data.table(prob = seq(0.1, 0.9, by = 0.1))
dt = dt[, as.list(rbinom(prob, n = 5, size = 2)), by = prob]
dt
   prob V1 V2 V3 V4 V5
1:  0.1  0  0  0  0  0
2:  0.2  0  0  0  0  1
3:  0.3  1  2  1  0  1
4:  0.4  1  1  2  1  0
5:  0.5  2  2  1  1  1
6:  0.6  1  1  0  0  1
7:  0.7  2  1  2  1  0
8:  0.8  2  1  2  0  1
9:  0.9  2  2  2  2  2

I would add that my hunch is that the fastest way to do this would be to make the matrix first and then assign columns. 我要补充一点，我的直觉是最快的方法是首先制作矩阵，然后分配列。

> mat = mapply(rbinom, prob = dt$prob, n = 5, size = 2)
> cbind(dt, t(mat))
   prob V1 V2 V3 V4 V5
1:  0.1  0  0  0  0  0
2:  0.2  1  0  0  1  1
3:  0.3  1  1  1  0  0
4:  0.4  1  0  2  1  1
5:  0.5  1  1  1  0  2
6:  0.6  2  0  2  1  1
7:  0.7  1  1  1  2  1
8:  0.8  1  2  1  0  2
9:  0.9  1  1  2  1  1

A very quick test on an 8000 row table shows this is faster: 在8000行表上进行的非常快速的测试表明，这样做更快：

> dt = data.table(prob = (seq(0.1, 0.9, by = 0.00001)))
> system.time(for(i in 1:10) dt[, as.list(rbinom(prob, n = 5, size = 2)), by = prob])
   user  system elapsed 
   6.14    0.00    6.16 
> system.time(for(i in 1:10) {mat = mapply(rbinom, prob = dt$prob, n = 5, size = 2) ; cbind(dt, t(mat))})
   user  system elapsed 
   2.61    0.00    2.62

And both are a substantial improvement on the original: 两者都是对原始版本的重大改进：

> system.time(for(i in 1:10) {df1 = mdply(df, rbinom, n = 5, size = 2) ; dt1 = data.table(df1)})
   user  system elapsed 
 152.23   46.60  200.07

在R中将模拟结果存储为data.table

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-07-31 17:32:15

在R中将模拟结果存储为data.table

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-07-31 17:32:15

解决方案1
3 已采纳 2015-07-31 17:32:15