[英]How to initialize workers to use package functions in parallel
I am developing an R package and trying to use parallel processing in it for an embarrassingly parallel problem.我正在开发一个 R package 并尝试在其中使用并行处理来解决一个令人尴尬的并行问题。 I would like to write a loop or functional that uses the other functions from my package. I am working in Windows, and I have tried using
parallel::parLapply
and foreach::%dopar%
, but cannot get the workers (cores) to access the functions in my package. Here's an example of a simple package with two functions, where the second calls the first inside a parallel loop using %dopar%
:我想编写一个循环或函数,使用我的 package 中的其他函数。我在 Windows 工作,我尝试使用
parallel::parLapply
和foreach::%dopar%
,但无法让工作人员(核心)到访问我的 package 中的函数。这是一个带有两个函数的简单 package 的示例,其中第二个使用%dopar%
在并行循环中调用第一个:
add10 <- function(x) x + 10
slowadd <- function(m) {
cl <- parallel::makeCluster(parallel::detectCores() - 1)
doParallel::registerDoParallel(cl)
`%dopar%` <- foreach::`%dopar%` # so %dopar% doesn't need to be attached
foreach::foreach(i = 1:m) %dopar% {
Sys.sleep(1)
add10(i)
}
stopCluster(cl)
}
When I load the package with devtools::load_all()
and call the slowadd
function, Error in {: task 1 failed - "could not find function "add10""
is returned.当我使用
devtools::load_all()
加载 package 并调用slowadd
function 时, Error in {: task 1 failed - "could not find function "add10""
被返回。
I have also tried explicitly initializing the workers with my package:我还尝试使用我的 package 显式初始化工作人员:
add10 <- function(x) x + 10
slowadd <- function(m) {
cl <- parallel::makeCluster(parallel::detectCores() - 1)
doParallel::registerDoParallel(cl)
`%dopar%` <- foreach::`%dopar%` # so %dopar% doesn't need to be attached
foreach::foreach(i = 1:m, .packages = 'mypackage') %dopar% {
Sys.sleep(1)
add10(i)
}
stopCluster(cl)
}
but I get the error Error in e$fun(obj, substitute(ex), parent.frame(), e$data): worker initialization failed: there is no package called 'mypackage'
.但我收到错误
Error in e$fun(obj, substitute(ex), parent.frame(), e$data): worker initialization failed: there is no package called 'mypackage'
。
How can I get the workers to access the functions in my package?如何让工作人员访问我的 package 中的功能? A solution using
foreach
would be great, but I'm completely open to solutions using parLapply
or other functions/packages.使用
foreach
的解决方案会很棒,但我对使用parLapply
或其他函数/包的解决方案完全开放。
I was able to initialize the workers with my package's functions, thanks to people's helpful comments.感谢人们的有用评论,我能够用我的包的功能初始化工作人员。 By making sure that all of the package functions that were needed were exported in the NAMESPACE and installing my package with
devtools::install()
, foreach
was able to find the package for initialization.通过确保所需的所有 package 函数都在 NAMESPACE 中导出并使用
devtools::install()
安装我的 package, foreach
能够找到 package 进行初始化。 The R script for the example would look like this:该示例的 R 脚本如下所示:
#' @export
add10 <- function(x) x + 10
#' @export
slowadd <- function(m) {
cl <- parallel::makeCluster(parallel::detectCores() - 1)
doParallel::registerDoParallel(cl)
`%dopar%` <- foreach::`%dopar%` # so %dopar% doesn't need to be attached
out <- foreach::foreach(i = 1:m, .packages = 'mypackage') %dopar% {
Sys.sleep(1)
add10(i)
}
stopCluster(cl)
return(out)
}
This is working, but it's not an ideal solution.这是有效的,但它不是一个理想的解决方案。 First, it makes for a much slower workflow.
首先,它使工作流程慢得多。 I was using
devtools::load_all()
every time I made a change to the package and wanted to test it (before incorporating parallelism), but now I have to reinstall the package every time, which is slow when the package is large.每次我对 package 进行更改并想测试它(在合并并行性之前)时,我都在使用
devtools::load_all()
),但现在我每次都必须重新安装 package,当 package 很大时,这很慢。 Second, every function that is needed in the parallel loop needs to be exported so that foreach
can find it.其次,需要导出并行循环中需要的每个 function,以便
foreach
可以找到它。 My actual use case has a lot of small utility functions which I would rather keep internal.我的实际用例有很多我宁愿保留在内部的小实用程序功能。
You can use devtools::load_all()
inside the foreach loop or load the functions you need with source
.您可以在 foreach 循环内使用
devtools::load_all()
或使用source
加载您需要的函数。
out <- foreach::foreach(i = 1:m ) %dopar% {
Sys.sleep(1)
source("R/some_functions.R")
load("R/sysdata.rda")
add10(i)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.