简体   繁体   English

如何初始化worker并行使用package个函数

[英]How to initialize workers to use package functions in parallel

I am developing an R package and trying to use parallel processing in it for an embarrassingly parallel problem.我正在开发一个 R package 并尝试在其中使用并行处理来解决一个令人尴尬的并行问题。 I would like to write a loop or functional that uses the other functions from my package. I am working in Windows, and I have tried using parallel::parLapply and foreach::%dopar% , but cannot get the workers (cores) to access the functions in my package. Here's an example of a simple package with two functions, where the second calls the first inside a parallel loop using %dopar% :我想编写一个循环或函数,使用我的 package 中的其他函数。我在 Windows 工作,我尝试使用parallel::parLapplyforeach::%dopar% ,但无法让工作人员(核心)到访问我的 package 中的函数。这是一个带有两个函数的简单 package 的示例,其中第二个使用%dopar%在并行循环中调用第一个:

add10 <- function(x) x + 10

slowadd <- function(m) {
  cl <- parallel::makeCluster(parallel::detectCores() - 1)
  doParallel::registerDoParallel(cl)

  `%dopar%` <- foreach::`%dopar%` # so %dopar% doesn't need to be attached

  foreach::foreach(i = 1:m) %dopar% {
    Sys.sleep(1)
    add10(i)
  }

  stopCluster(cl)
}

When I load the package with devtools::load_all() and call the slowadd function, Error in {: task 1 failed - "could not find function "add10"" is returned.当我使用devtools::load_all()加载 package 并调用slowadd function 时, Error in {: task 1 failed - "could not find function "add10""被返回。

I have also tried explicitly initializing the workers with my package:我还尝试使用我的 package 显式初始化工作人员:

add10 <- function(x) x + 10

slowadd <- function(m) {
  cl <- parallel::makeCluster(parallel::detectCores() - 1)
  doParallel::registerDoParallel(cl)

  `%dopar%` <- foreach::`%dopar%` # so %dopar% doesn't need to be attached

  foreach::foreach(i = 1:m, .packages = 'mypackage') %dopar% {
    Sys.sleep(1)
    add10(i)
  }

  stopCluster(cl)
}

but I get the error Error in e$fun(obj, substitute(ex), parent.frame(), e$data): worker initialization failed: there is no package called 'mypackage' .但我收到错误Error in e$fun(obj, substitute(ex), parent.frame(), e$data): worker initialization failed: there is no package called 'mypackage'

How can I get the workers to access the functions in my package?如何让工作人员访问我的 package 中的功能? A solution using foreach would be great, but I'm completely open to solutions using parLapply or other functions/packages.使用foreach的解决方案会很棒,但我对使用parLapply或其他函数/包的解决方案完全开放。

I was able to initialize the workers with my package's functions, thanks to people's helpful comments.感谢人们的有用评论,我能够用我的包的功能初始化工作人员。 By making sure that all of the package functions that were needed were exported in the NAMESPACE and installing my package with devtools::install() , foreach was able to find the package for initialization.通过确保所需的所有 package 函数都在 NAMESPACE 中导出并使用devtools::install()安装我的 package, foreach能够找到 package 进行初始化。 The R script for the example would look like this:该示例的 R 脚本如下所示:

#' @export
add10 <- function(x) x + 10

#' @export
slowadd <- function(m) {
  cl <- parallel::makeCluster(parallel::detectCores() - 1)
  doParallel::registerDoParallel(cl)

  `%dopar%` <- foreach::`%dopar%` # so %dopar% doesn't need to be attached

  out <- foreach::foreach(i = 1:m, .packages = 'mypackage') %dopar% {
    Sys.sleep(1)
    add10(i)
  }

  stopCluster(cl)
  return(out)
} 

This is working, but it's not an ideal solution.这是有效的,但它不是一个理想的解决方案。 First, it makes for a much slower workflow.首先,它使工作流程慢得多。 I was using devtools::load_all() every time I made a change to the package and wanted to test it (before incorporating parallelism), but now I have to reinstall the package every time, which is slow when the package is large.每次我对 package 进行更改并想测试它(在合并并行性之前)时,我都在使用devtools::load_all() ),但现在我每次都必须重新安装 package,当 package 很大时,这很慢。 Second, every function that is needed in the parallel loop needs to be exported so that foreach can find it.其次,需要导出并行循环中需要的每个 function,以便foreach可以找到它。 My actual use case has a lot of small utility functions which I would rather keep internal.我的实际用例有很多我宁愿保留在内部的小实用程序功能。

You can use devtools::load_all() inside the foreach loop or load the functions you need with source .您可以在 foreach 循环内使用devtools::load_all()或使用source加载您需要的函数。

out <- foreach::foreach(i = 1:m ) %dopar% {
    Sys.sleep(1)
    source("R/some_functions.R")
    load("R/sysdata.rda")
    add10(i)
  }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM