简体   繁体   English

R - 并行运行时获取工作者名称

[英]R - get worker name when running in parallel

I am running a function in parallel. 我正在并行运行一个函数。 In order to get progress updates on the state of the work, I would like one but only one worker to report periodically on its progress. 为了获得有关工作状态的最新进展,我希望只有一名工作人员定期报告其进展情况。 My natural thought for how to do this would be to have the function that the workers execute check the name of the worker, and only give the status update if the name matches a particular value. 我对如何执行此操作的自然想法是让worker执行的函数检查worker的名称,并且只有在名称与特定值匹配时才提供状态更新。 But, I can't find a reliable way to determine this in advance. 但是,我找不到一个可靠的方法来提前确定。 In Julia for instance, there is a simple myid() function that will give a worker's ID (ie 1, 2, etc.). 例如,在Julia中,有一个简单的myid()函数可以给出一个worker的ID(即1,2等)。 I am looking for something equivalent in R. The best that I've found so far is to have each worker call Sys.getpid() . 我正在寻找R中的等价物。到目前为止,我发现的最好的是让每个工人调用Sys.getpid() But, I don't know a reliable way to write my script so that I'll know in advance what one of the pids that gets assigned to a worker would be. 但是,我不知道编写脚本的可靠方法,以便我事先知道分配给工作人员的一个pid是什么。 The basic functionality script that I'm looking to write looks like the below, with the exception that I'm looking for R's equivalent to the myid() function: 我正在寻找的基本功能脚本如下所示,除了我正在寻找R等效于myid()函数:

library(parallel)

Test_Fun = function(a){
    for (idx in 1:10){
        Sys.sleep(1)
        if (myid() == 1){
            print(idx)
        }
    }
}

mclapply(1:4, Test_Fun, mc.cores = 4)

The parallel package doesn't provide a worker ID function as of R 3.3.2. parallel程序包不提供R 3.3.2中的工作者ID功能。 There also isn't a mechanism provided to initialize the workers before they start to execute tasks. 在开始执行任务之前,还没有提供初始化工作程序的机制。

I suggest that you pass an additional task ID argument to the worker function by using the mcmapply function. 我建议您使用mcmapply函数将其他任务ID参数传递给worker函数。 If the number of tasks is equal to the number of workers, the task ID can be used as a worker ID. 如果任务数等于工作者数,则任务ID可用作工作者ID。 For example: 例如:

library(parallel)
Test_Fun = function(a, taskid){
    for (idx in 1:10){
        Sys.sleep(1)
        if (taskid == 1){
            print(idx)
        }
    }
}
mcmapply(Test_Fun, 1:4, 1:4, mc.cores = 4)

But if there are more tasks than workers, you'll only see the progress messages for the first task. 但是,如果任务多于工作人员,则只能看到第一个任务的进度消息。 You can work around that by initializing each of the workers when they execute their first task: 您可以通过在执行第一个任务时初始化每个工作人员来解决这个问题:

WORKERID <- NA  # indicates worker is uninitialized
Test_Fun = function(a, taskid){
    if (is.na(WORKERID)) WORKERID <<- taskid
    for (idx in 1:10){
        Sys.sleep(1)
        if (WORKERID == 1){
            print(idx)
        }
    }
}
cores <- 4
mcmapply(Test_Fun, 1:8, 1:cores, mc.cores = cores)

Note that this assumes that mc.preschedule is TRUE , which is the default. 请注意,这假设mc.prescheduleTRUE ,这是默认值。 If mc.preschedule is FALSE and the number of tasks is greater than the number of workers, the situation is much more dynamic because each task is executed by a different worker process and the workers don't all execute concurrently. 如果mc.prescheduleFALSE且任务数大于工作者数,则情况更加动态,因为每个任务都由不同的工作进程执行,并且工作程序并非全部并发执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM