简体   繁体   English

在R中使用并行plyr和data.table时出错:do.ply(i)中的错误:任务1失败 - “无效的下标类型'列表'”

[英]Error using using parallel plyr and data.table in R: Error in do.ply(i) : task 1 failed - “invalid subscript type 'list'”

I am trying to use the data.table package along with plyr to do some parallel computation in R,and I am getting unexpected behavior. 我试图使用data.table包和plyr在R中进行一些并行计算,我得到了意想不到的行为。 I am using windows 7. 我正在使用Windows 7。

I have created the following function that produces a freqency table using data.table 我创建了以下函数,使用data.table生成频率表

 t_dt_test <- function(x){ 
    #creates a 1-d frequency table for x
    dt <- data.table(x)
    dt[, j = list(freq = .N), by = x] 
}

Create some test data 创建一些测试数据

 test <- list(letters[1:3],letters[1:3],letters[1:3])

This works fine using llply with .parallel = FALSE 使用.parallel = FALSE llply可以.parallel = FALSE

 llply(test, t_dt_test, .parallel = FALSE)
     [[1]]
   x freq
1: a    1
2: b    1
3: c    1

Buyt if I try it in parrallel it does not work 如果我尝试平行购买它不起作用

library(doParallel)
nodes <- detectCores()
cl <-makeCluster(nodes)


llply(test, t_dt_test, .parallel = TRUE ,.paropts = list( .packages = 'data.table'))

Returns this 返回此

Error in do.ply(i) : task 1 failed - "invalid subscript type 'list'"

It seems that the [.data.table is not being passed to the nodes as I would expect. 似乎[.data.table没有像我期望的那样传递给节点。

I tried changing the function to 我尝试将功能更改为

 t_dt_test <- function(x){ 
        #creates a 1-d frequency table for x
        dt <- data.table(x)
        data.table:::`[.data.table`(x = dt,  j = list(freq = .N), by = x)
    }

but still gets the same error. 但仍然得到相同的错误。

A similar question was asked here: Strange environment behavior in parallel plyr but got no answers 这里也提出了类似的问题: 并行plyr中的奇怪环境行为但没有得到答案

Any suggestions? 有什么建议?

With a very small amount of extra work you can paralellize this using the foreach package. 只需很少量的额外工作,您就可以使用foreach包来解决这个问题。

Try this: 尝试这个:

library(data.table)

t_dt_test <- function(x){ 
  dt <- data.table(x)
  dt[, j = list(freq = .N), by = x] 
}

test <- list(letters[1:3],letters[1:3],letters[1:3])

Use foreach() in sequential compute mode first, to ensure you have the correct syntax. 首先在顺序计算模式下使用foreach() ,以确保您具有正确的语法。 Since the foreach() loop takes care of splitting and combining, llply() is no longer used: 由于foreach()循环负责拆分和组合, llply()不再使用llply()

library(foreach)
foreach(i = seq_along(test), .combine = c) %do% {
  list(
    t_dt_test(test[[i]])
  )
}

To run this in parallel, you only have to change %do% to %dopar and remember to add data.table to the package list: 要并行运行,只需将%do%更改为%dopar并记住将data.table添加到包列表中:

library(doParallel)
registerDoParallel(detectCores())
foreach(i = seq_along(test), .combine = c, .packages = "data.table") %dopar% {
  list(
    t_dt_test(test[[i]])
  )
}

The results are as expected: 结果如预期:

[[1]]
   x freq
1: a    1
2: b    1
3: c    1

[[2]]
   x freq
1: a    1
2: b    1
3: c    1

[[3]]
   x freq
1: a    1
2: b    1
3: c    1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM