简体   繁体   中英

Error using using parallel plyr and data.table in R: Error in do.ply(i) : task 1 failed - “invalid subscript type 'list'”

I am trying to use the data.table package along with plyr to do some parallel computation in R,and I am getting unexpected behavior. I am using windows 7.

I have created the following function that produces a freqency table using data.table

 t_dt_test <- function(x){ 
    #creates a 1-d frequency table for x
    dt <- data.table(x)
    dt[, j = list(freq = .N), by = x] 
}

Create some test data

 test <- list(letters[1:3],letters[1:3],letters[1:3])

This works fine using llply with .parallel = FALSE

 llply(test, t_dt_test, .parallel = FALSE)
     [[1]]
   x freq
1: a    1
2: b    1
3: c    1

Buyt if I try it in parrallel it does not work

library(doParallel)
nodes <- detectCores()
cl <-makeCluster(nodes)


llply(test, t_dt_test, .parallel = TRUE ,.paropts = list( .packages = 'data.table'))

Returns this

Error in do.ply(i) : task 1 failed - "invalid subscript type 'list'"

It seems that the [.data.table is not being passed to the nodes as I would expect.

I tried changing the function to

 t_dt_test <- function(x){ 
        #creates a 1-d frequency table for x
        dt <- data.table(x)
        data.table:::`[.data.table`(x = dt,  j = list(freq = .N), by = x)
    }

but still gets the same error.

A similar question was asked here: Strange environment behavior in parallel plyr but got no answers

Any suggestions?

With a very small amount of extra work you can paralellize this using the foreach package.

Try this:

library(data.table)

t_dt_test <- function(x){ 
  dt <- data.table(x)
  dt[, j = list(freq = .N), by = x] 
}

test <- list(letters[1:3],letters[1:3],letters[1:3])

Use foreach() in sequential compute mode first, to ensure you have the correct syntax. Since the foreach() loop takes care of splitting and combining, llply() is no longer used:

library(foreach)
foreach(i = seq_along(test), .combine = c) %do% {
  list(
    t_dt_test(test[[i]])
  )
}

To run this in parallel, you only have to change %do% to %dopar and remember to add data.table to the package list:

library(doParallel)
registerDoParallel(detectCores())
foreach(i = seq_along(test), .combine = c, .packages = "data.table") %dopar% {
  list(
    t_dt_test(test[[i]])
  )
}

The results are as expected:

[[1]]
   x freq
1: a    1
2: b    1
3: c    1

[[2]]
   x freq
1: a    1
2: b    1
3: c    1

[[3]]
   x freq
1: a    1
2: b    1
3: c    1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM