简体   繁体   English

在R中将模拟结果存储为data.table

[英]Storing simulation results as data.table in R

I've to do plenty of simulation and it takes lot of time. 我必须进行大量的模拟,这需要很多时间。 I think processing time can be reduced through data.table . 我认为可以通过data.table减少处理时间。 How can I store the results of mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2) into data.table without saving its output first to data.frame . 我如何将mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2)data.table而不先保存其输出到data.frame

library(plyr)
df1 <- mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2)
library(data.table)
dt1 <- data.table(df1)

Edited 已编辑

I know that I can use setDT(df1) to avoid the creating to dt1 . 我知道我可以使用setDT(df1)避免创建到dt1 However, the main problem is about mdply which creates a data.frame which consumes a lot of time. 但是,主要问题是关于mdply创建一个data.frame ,它消耗大量时间。

plyr and data.table are for very similar purposes, so you usually don't need to switch back and forth between the two at all. plyrdata.table的用途非常相似,因此您通常根本不需要在两者之间来回切换。 You can do everything with data.table in this case: 在这种情况下,您可以使用data.table进行所有data.table

dt = data.table(prob = seq(0.1, 0.9, by = 0.1))
dt = dt[, as.list(rbinom(prob, n = 5, size = 2)), by = prob]
dt
   prob V1 V2 V3 V4 V5
1:  0.1  0  0  0  0  0
2:  0.2  0  0  0  0  1
3:  0.3  1  2  1  0  1
4:  0.4  1  1  2  1  0
5:  0.5  2  2  1  1  1
6:  0.6  1  1  0  0  1
7:  0.7  2  1  2  1  0
8:  0.8  2  1  2  0  1
9:  0.9  2  2  2  2  2

I would add that my hunch is that the fastest way to do this would be to make the matrix first and then assign columns. 我要补充一点,我的直觉是最快的方法是首先制作矩阵,然后分配列。

> mat = mapply(rbinom, prob = dt$prob, n = 5, size = 2)
> cbind(dt, t(mat))
   prob V1 V2 V3 V4 V5
1:  0.1  0  0  0  0  0
2:  0.2  1  0  0  1  1
3:  0.3  1  1  1  0  0
4:  0.4  1  0  2  1  1
5:  0.5  1  1  1  0  2
6:  0.6  2  0  2  1  1
7:  0.7  1  1  1  2  1
8:  0.8  1  2  1  0  2
9:  0.9  1  1  2  1  1

A very quick test on an 8000 row table shows this is faster: 在8000行表上进行的非常快速的测试表明,这样做更快:

> dt = data.table(prob = (seq(0.1, 0.9, by = 0.00001)))
> system.time(for(i in 1:10) dt[, as.list(rbinom(prob, n = 5, size = 2)), by = prob])
   user  system elapsed 
   6.14    0.00    6.16 
> system.time(for(i in 1:10) {mat = mapply(rbinom, prob = dt$prob, n = 5, size = 2) ; cbind(dt, t(mat))})
   user  system elapsed 
   2.61    0.00    2.62 

And both are a substantial improvement on the original: 两者都是对原始版本的重大改进:

> system.time(for(i in 1:10) {df1 = mdply(df, rbinom, n = 5, size = 2) ; dt1 = data.table(df1)})
   user  system elapsed 
 152.23   46.60  200.07

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM