[英]Error using using parallel plyr and data.table in R: Error in do.ply(i) : task 1 failed - “invalid subscript type 'list'”
I am trying to use the data.table
package along with plyr
to do some parallel computation in R,and I am getting unexpected behavior. 我试图使用
data.table
包和plyr
在R中进行一些并行计算,我得到了意想不到的行为。 I am using windows 7. 我正在使用Windows 7。
I have created the following function that produces a freqency table using data.table
我创建了以下函数,使用
data.table
生成频率表
t_dt_test <- function(x){
#creates a 1-d frequency table for x
dt <- data.table(x)
dt[, j = list(freq = .N), by = x]
}
Create some test data 创建一些测试数据
test <- list(letters[1:3],letters[1:3],letters[1:3])
This works fine using llply
with .parallel = FALSE
使用
.parallel = FALSE
llply
可以.parallel = FALSE
llply(test, t_dt_test, .parallel = FALSE)
[[1]]
x freq
1: a 1
2: b 1
3: c 1
Buyt if I try it in parrallel it does not work 如果我尝试平行购买它不起作用
library(doParallel)
nodes <- detectCores()
cl <-makeCluster(nodes)
llply(test, t_dt_test, .parallel = TRUE ,.paropts = list( .packages = 'data.table'))
Returns this 返回此
Error in do.ply(i) : task 1 failed - "invalid subscript type 'list'"
It seems that the [.data.table
is not being passed to the nodes as I would expect. 似乎
[.data.table
没有像我期望的那样传递给节点。
I tried changing the function to 我尝试将功能更改为
t_dt_test <- function(x){
#creates a 1-d frequency table for x
dt <- data.table(x)
data.table:::`[.data.table`(x = dt, j = list(freq = .N), by = x)
}
but still gets the same error. 但仍然得到相同的错误。
A similar question was asked here: Strange environment behavior in parallel plyr but got no answers 这里也提出了类似的问题: 并行plyr中的奇怪环境行为但没有得到答案
Any suggestions? 有什么建议?
With a very small amount of extra work you can paralellize this using the foreach
package. 只需很少量的额外工作,您就可以使用
foreach
包来解决这个问题。
Try this: 尝试这个:
library(data.table)
t_dt_test <- function(x){
dt <- data.table(x)
dt[, j = list(freq = .N), by = x]
}
test <- list(letters[1:3],letters[1:3],letters[1:3])
Use foreach()
in sequential compute mode first, to ensure you have the correct syntax. 首先在顺序计算模式下使用
foreach()
,以确保您具有正确的语法。 Since the foreach()
loop takes care of splitting and combining, llply()
is no longer used: 由于
foreach()
循环负责拆分和组合, llply()
不再使用llply()
:
library(foreach)
foreach(i = seq_along(test), .combine = c) %do% {
list(
t_dt_test(test[[i]])
)
}
To run this in parallel, you only have to change %do%
to %dopar
and remember to add data.table
to the package list: 要并行运行,只需将
%do%
更改为%dopar
并记住将data.table
添加到包列表中:
library(doParallel)
registerDoParallel(detectCores())
foreach(i = seq_along(test), .combine = c, .packages = "data.table") %dopar% {
list(
t_dt_test(test[[i]])
)
}
The results are as expected: 结果如预期:
[[1]]
x freq
1: a 1
2: b 1
3: c 1
[[2]]
x freq
1: a 1
2: b 1
3: c 1
[[3]]
x freq
1: a 1
2: b 1
3: c 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.