简体   繁体   English

R中的并行化:如何在每个节点上“源”?

[英]Parallelization in R: how to “source” on every node?

I have created parallel workers (all running on the same machine) using: 我使用以下方法创建并行工作者(所有工作在同一台机器上):

MyCluster = makeCluster(8)

How can I make every of these 8 nodes source an R-file I wrote? 如何让这8个节点中的每个节点都来源我写的R文件? I tried: 我试过了:

clusterCall(MyCluster, source, "myFile.R")
clusterCall(MyCluster, 'source("myFile.R")')

And several similar versions. 和几个相似的版本。 But none worked. 但都没有效果。 Can you please help me to find the mistake? 你能帮我找错吗?

Thank you very much! 非常感谢你!

The following code serves your purpose: 以下代码符合您的目的:

library(parallel)

cl <- makeCluster(4)
clusterCall(cl, function() { source("test.R") })

## do some parallel work

stopCluster(cl)

Also you can use clusterEvalQ() to do the same thing: 您还可以使用clusterEvalQ()执行相同的操作:

library(parallel)

cl <- makeCluster(4)
clusterEvalQ(cl, source("test.R"))

## do some parallel work

stopCluster(cl)

However, there is subtle difference between the two methods. 但是,这两种方法之间存在细微差别。 clusterCall() runs a function on each node while clusterEvalQ() evaluates an expression on each node. clusterCall()在每个节点上运行一个函数,而clusterEvalQ()计算每个节点上的表达式。 If you have a variable list of files to source, clusterCall() will be easier to use since clusterEvalQ(cl,expr) will regard any expr as an expression so it's not convenient to put a variable there. 如果你有一个可变源文件列表, clusterCall()将更容易使用,因为clusterEvalQ(cl,expr)将任何expr视为一个表达式,因此在那里放置一个变量是不方便的。

If you use a command to source a local file, ensure the file is there. 如果使用命令来源本地文件,请确保该文件存在。

Else place the file on a network share or NFS, and source the absolute path. 否则将文件放在网络共享或NFS上,并获取绝对路径。

Better still, and standard answers, write a package and have that package installed on each node and then just call library() or require() . 更好的是,标准答案, 编写一个包,并在每个节点上安装该包,然后只调用library()require()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM