繁体   English   中英

如何使doSMP与plyr完美搭配?

[英]How do I make doSMP play nicely with plyr?

此代码有效:

library(plyr)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE) 

虽然此代码失败:

library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopWorkers(workers)

>Error in do.ply(i) : task 3 failed - "subscript out of bounds"
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’

2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’

我使用的是R 2.1.12,plyr 1.4和doSMP 1.0-1。 有没有人想出办法解决这个问题?

编辑:回应安德里,这是一个进一步的说明:

system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #1
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #2
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #3
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #4
stopWorkers(workers)

前三个功能起作用,但它们都需要大约3秒钟。 函数#2发出警告,没有注册并行后端,因此顺序执行。 函数#4给出了我在原帖中引用的相同错误。

/ edit:curioser和curiouser:在我的Mac上,以下工作:

library(plyr)
library(doMC)
registerDoMC()
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)

但这失败了:

library(plyr)
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopWorkers(workers)

这也失败了:

library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopCluster(cl)

所以我认为foreach的各种平行后端是不可互换的。

虽然@hadley已经很好地回答了这个问题,但我想补充一点,我认为plyr现在适用于其他foreach并行后端。 以下是博客条目的链接 ,其中包含plyr与doSNOW结合使用的示例:

为了确认@ LeeZamparo的答案, plyr现在似乎与snow一起工作,至少在Windows 7上使用R版本2.15.0。 问题中的最后一块代码可以工作,但有一些神秘的警告:

library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)

x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)

library(microbenchmark)
mb <- microbenchmark(

      PP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE),
      NP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE) 
                     )

stopCluster(cl)

隐秘警告:

> warnings()
Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...

它不是很快,我猜这是开销......

> mb
Unit: milliseconds
                                                             expr
1 NP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = FALSE)
2 PP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = TRUE)
        min        lq    median        uq       max
1  11.91518  15.74567  20.10944  23.30453  38.09237
2 314.58008 336.81160 348.42421 358.57337 575.11220

检查它给出了预期的结果

> PP
  V V1
1 X  4
2 Y  6
3 Z  5

有关此会话的额外详细信息

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.1-3 doSNOW_1.0.6         iterators_1.0.6     
[4] foreach_1.4.0        plyr_1.7.1           snow_0.3-10          

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 tools_2.15.0

事实证明, plyr只适用于doMC ,但开发人员正在研究它。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM