I am trying to use the data.table
package along with plyr
to do some parallel computation in R,and I am getting unexpected behavior. I am using windows 7.
I have created the following function that produces a freqency table using data.table
t_dt_test <- function(x){
#creates a 1-d frequency table for x
dt <- data.table(x)
dt[, j = list(freq = .N), by = x]
}
Create some test data
test <- list(letters[1:3],letters[1:3],letters[1:3])
This works fine using llply
with .parallel = FALSE
llply(test, t_dt_test, .parallel = FALSE)
[[1]]
x freq
1: a 1
2: b 1
3: c 1
Buyt if I try it in parrallel it does not work
library(doParallel)
nodes <- detectCores()
cl <-makeCluster(nodes)
llply(test, t_dt_test, .parallel = TRUE ,.paropts = list( .packages = 'data.table'))
Returns this
Error in do.ply(i) : task 1 failed - "invalid subscript type 'list'"
It seems that the [.data.table
is not being passed to the nodes as I would expect.
I tried changing the function to
t_dt_test <- function(x){
#creates a 1-d frequency table for x
dt <- data.table(x)
data.table:::`[.data.table`(x = dt, j = list(freq = .N), by = x)
}
but still gets the same error.
A similar question was asked here: Strange environment behavior in parallel plyr but got no answers
Any suggestions?
With a very small amount of extra work you can paralellize this using the foreach
package.
Try this:
library(data.table)
t_dt_test <- function(x){
dt <- data.table(x)
dt[, j = list(freq = .N), by = x]
}
test <- list(letters[1:3],letters[1:3],letters[1:3])
Use foreach()
in sequential compute mode first, to ensure you have the correct syntax. Since the foreach()
loop takes care of splitting and combining, llply()
is no longer used:
library(foreach)
foreach(i = seq_along(test), .combine = c) %do% {
list(
t_dt_test(test[[i]])
)
}
To run this in parallel, you only have to change %do%
to %dopar
and remember to add data.table
to the package list:
library(doParallel)
registerDoParallel(detectCores())
foreach(i = seq_along(test), .combine = c, .packages = "data.table") %dopar% {
list(
t_dt_test(test[[i]])
)
}
The results are as expected:
[[1]]
x freq
1: a 1
2: b 1
3: c 1
[[2]]
x freq
1: a 1
2: b 1
3: c 1
[[3]]
x freq
1: a 1
2: b 1
3: c 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.