简体   繁体   English

在R parallel :: mcparallel中,是否有可能限制一次使用的内核数?

[英]Is it possible, in R parallel::mcparallel, to limit the number of cores used at any one time?

In R, the mcparallel() function in the parallel package forks off a new task to a worker each time it is called. 在R中, parallel包中的mcparallel()函数每次都会将新任务派生给工作程序。 If my machine has N (physical) cores, and I fork off 2N tasks, for example, then each core starts off running two tasks, which is not desirable. 例如,如果我的计算机具有N个(物理)核心,而我分叉了2N个任务,则每个核心都会开始运行两个任务,这是不希望的。 I would rather like to be able to start running N tasks on N workers, and then, as each tasks finishes, submit the next task to the now-available core. 我希望能够在N个工作线程上开始运行N个任务,然后在完成每个任务后,将下一个任务提交到现在可用的核心中。 Is there an easy way to do this? 是否有捷径可寻?

My tasks take different amounts of time, so it is not an option to fork off the tasks serial in batches of N. There might be some workarounds, such as checking the number of active cores and then submitting new tasks when they become free, but does anyone know of a simple solution? 我的任务花费的时间不同,因此不能选择分批分发N个任务。可能有一些解决方法,例如检查活动核心的数量,然后在空闲时提交新任务,但是有谁知道一个简单的解决方案?

I have tried setting cl <- makeForkCluster(nnodes=N) , which does indeed set N cores going, but these are not then used by mcparallel() . 我试过设置cl <- makeForkCluster(nnodes=N) ,的确设置了N个内核,但是mcparallel()并没有使用mcparallel() Indeed, there appears to be no way to feed cl into mcparallel() . 确实,似乎没有办法将cl送入mcparallel() The latter has an option mc.affinity , but it's unclear how to use this and it doesn't seem to do what I want anyway (and according to the documentation its functionality is machine dependent). 后者有一个选项mc.affinity ,但目前尚不清楚如何使用它,而且它似乎也无法满足我的要求(根据文档,其功能取决于机器)。

you have at least 2 possibilities: 您至少有两种可能性:

  1. As mentioned above you can use mcparallel's parameters "mc.cores" or "mc.affinity". 如上所述,您可以使用mcparallel的参数“ mc.cores”或“ mc.affinity”。 On AMD platforms "mc.affinity" is preferred since two cores share same clock. 在AMD平台上,首选“ mc.affinity”,因为两个内核共享同一时钟。 For example an FX-8350 has 8 cores, but core 0 has same clock as core 1. If you start a task for 2 cores only it is better to assign it to cores 0 and 1 rather than 0 and 2. "mc.affinity" makes that. 例如,FX-8350具有8个核心,但是核心0与核心1具有相同的时钟。如果仅针对2个核心启动任务,则最好将其分配给核心0和1,而不是0和2。 ”。 The price is loosing load balancing. 价格正在失去负载平衡。

    "mc.affinity" is present in recent versions of the package. 软件包的最新版本中包含“ mc.affinity”。 See changelog to find when introduced. 请参阅changelog以查找引入的内容。

  2. Also you can use OS's tool for setting affinity, eg "taskset": 您也可以使用OS的工具来设置相似性,例如“任务集”:

    /usr/bin/taskset -c 0-1 /usr/bin/R ... / usr / bin / taskset -c 0-1 / usr / bin / R ...

    Here you make your script to run on cores 0 and 1 only. 在这里,您可以使脚本仅在核心0和1上运行。

Keep in mind Linux numbers its cores starting from "0". 请记住,Linux将其内核从“ 0”开始编号。 Package parallel conforms to R's indexing and first core is core number 1. 程序包并行符合R的索引,并且第一个核心是核心编号1。

I'd suggest taking advantage of the higher level functions in parallel that include this functionality instead of trying to force low level functions to do what you want. 我建议并行利用包含此功能的更高级别的功能,而不要尝试强迫低级功能执行您想要的操作。

In this case, try writing your tasks as different arguments of a single function. 在这种情况下,请尝试将您的任务编写为单个函数的不同参数。 Then you can use mclapply() with the mc.preschedule parameter set to TRUE and the mc.cores parameter set to the number of threads you want to use at a time. 然后,可以将mc.preschedule参数设置为TRUE并将mc.cores参数设置为一次要使用的线程数来使用mclapply()。 Each time a task finishes and a thread closes, a new thread will be created, operating on the next available task. 每次任务完成并关闭线程时,都会创建一个新线程,对下一个可用任务进行操作。

Even if each task uses a completely different bit of code, you can create a list of functions and pass that to a wrapper function. 即使每个任务使用完全不同的代码位,您也可以创建函数列表并将其传递给包装函数。 For example, the following code executes two functions at a time. 例如,以下代码一次执行两个功能。

f1 <- function(x) {x^2}
f2 <- function(x) {2*x}
f3 <- function(x) {3*x}
f4 <- function(x) {x*3}
params <- list(f1,f2,f3,f4)
wrapper <- function(f,inx){f(inx)}
output <- mclapply(params,FUN=calling,mc.preschedule=TRUE,mc.cores=2,inx=5)

If need be you could make params a list of lists including various parameters to be passed to each function as well as the function definition. 如果需要,可以使params为列表列表,其中包括要传递给每个函数的各种参数以及函数定义。 I've used this approach frequently with various tasks of different lengths and it works well. 我经常将此方法用于不同长度的各种任务,并且效果很好。

Of course, it may be that your various tasks are just different calls to the same function, in which case you can use mclapply directly without having to write a wrapper function. 当然,您的各种任务可能只是对同一函数的不同调用,在这种情况下,您可以直接使用mclapply,而无需编写包装函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM