简体   繁体   中英

R - doRedis - Overwrite getTask to control the order of execution in parallel foreach loops

Problem: I need to control the order of execution in which tasks are processed in parallel by a foreach loop. Unfortunately, this is not supported by foreach.

Solution in mind : Using doRedis to use the database to hold all tasks, that are executed in the foreach loop. To control the order I want to overwrite getTask by setGetTask to get the tasks based on pre-specified order. Though I could not find to much documentation on how to do this.

Additional Information:

  1. There is a small paragraph on setGetTask with an example in the redis documentation .

     getTask <- function ( queue , job_id , ...) { key <- sprintf(" redisEval("local x=redis.call('hkeys',KEYS[1])[1]; if x==nil then return nil end; local ans=redis.call('hget',KEYS[1],x); redis.call('hdel',KEYS[1],x);i return ans",key) } setGetTask(getTask) 

    I though think the code in the documentation is syntactically not correct (missing imho a " and a closing bracket ")"). I thought this is not possible on CRAN, as the code for the documentation is executed on submission.

  2. Changing the getTask function does not change anything in regard of the workers getting tasks (even if introducing obvious non-sense into the redisEval like changing it to redisEval("dddddddddd(((")

  3. I only had access to the setGetTask function after installing the package from source (which I downloaded from the official CRAN package page of version 1.1.1 (which imho should make no difference than installing it directly from CRAN)

Data: The Dataframe of tasks to execute looks the following:

taskName;taskQueuePosition;parameter1;paramterN
taskT;1;val1;10
taskK;2;val2;8
taskP;3;val3;7
taskA;4;val4;7

I want to use 'taskQueuePosition' to control the order, tasks with lower numbers should be executed first.

Questions:

  1. Does anybody know any sources where I can get more information on doing this with doRedis or on setGetTask?
  2. Does anybody know how I need to change getTask to achieve the above described?
  3. Any other smart ideas to control the order of execution in a foreach loop? Preferably so that at some point I can use doRedis as parallel back end (changing this would mean a major change in the processing due to complicated technical infrastructure reasons).

Code (for easy reproduction):

The following assumes that the redis-server is started on the local machine.

Redis DB Filling:

library(doRedis)
library(foreach)

options('redis:num'=TRUE) # needed for proper execution

REDIS_JOB_QUEUE = "jobs"
registerDoRedis(REDIS_JOB_QUEUE)

# filling up the data frame
taskDF = data.frame(taskName=c("taskT","taskK","taskP","taskA"),
           taskQueuePosition=c(1,2,3,4),
           parameter1=c("val1","val2","val3","val4"),
           parameterN=c(10,8,7,7))

foreach(currTask=iter(taskDF, by='row'), 
        .verbose = T
) %dopar% {
  print(paste("Executing task: ",currTask$taskName))
  Sys.sleep(currTask$parameterN)
}

removeQueue(REDIS_JOB_QUEUE)

Worker:

library(doRedis)
REDIS_JOB_QUEUE = "jobs"

startLocalWorkers(n=1, queue=REDIS_JOB_QUEUE)

I could solve the problem and now can control the order of task execution.

Additional information:

1. There seems to be a typo in the documentation, that renders the getTask example not working. By considering the form of the default_getTask function from the file task.R in the package, it should look probably something like:

getTaskDefault <- function ( queue , job_id , ...)
{
  key <- sprintf("%s:%s",queue, job_id)
  return(redisEval("local x=redis.call('hkeys',KEYS[1])[1];
                   if x==nil then return nil end;
                   local ans=redis.call('hget',KEYS[1],x);
                   redis.call('set', KEYS[1] .. '.start.' .. x, x);
                   redis.call('hdel',KEYS[1],x);
                   return ans",key))
}

It seems that the letters behind first percent sign in the first line of the function got lost. This would explain the uneven number of brackets and quotes.

2) setGetTask still does not have any effect for me. When I set the getTask function though through .option while the DB is filled (like it is described in the vignette of the package ) it is successfully called.

3) The information on 2) means that I do not need the getTask function, so I can use the package from CRAN.

----- Questions -----

1) The doRedis vignette describes how a custom getTask can be successfully set.

2 and 3) When the LUA script in getTask function is modified like below, the tasks are drawn from the database in the way they are submitted. This is not exactly what I was asking for, but due to time restraints and the fact I have (or better had) not the first idea about LUA script, it is imho a satisfying solution to control the order of submission by the taskQueuePosition column.

getTaskInOrder <- function ( queue , job_id , ...)
{

  key <- sprintf("%s:%s",queue, job_id)
  return(redisEval("

        local tasks=redis.call('hkeys',KEYS[1]); -- get all tasks

        local x=tasks[1];           -- get first task available task
        if x==nil then              -- if there are no tasks left, stop processing
          return nil 
        end;  

        local xMin = 65535;         -- if we have more tasks than 65535, getting the 
        -- task with the lowest taskID is not guaranteed to be the first one
        local i = 1;
        -- local iMinFound = -1;
        while (x ~= nil) do         -- search the array until there are no tasks left
        -- print('x: ',x)
        local xNum = tonumber(x);
        if(xNum<xMin) then
          xMin = xNum;
          -- iMinFound = i;
        end
        i=i+1;
        -- print('i is now: ',i);
        x=tasks[i];
        end
        -- print('Minimum is task number',xMin,' found at i ', iMinFound)
        x=tostring(xMin)            -- convert it back to a string (maybe it would 
                                    -- be better to keep the original string somewhere, 
                                    -- in case we loose some information whilst converting to number)

        -- print('x is now:',x);
        -- print(KEYS[1] .. '.start.' .. x, x);
        -- print('');
        local ans=redis.call('hget',KEYS[1],x);
        redis.call('set', KEYS[1] .. '.start.' .. x, x);
        redis.call('hdel',KEYS[1],x);
        return ans",key))
}

Important note: I noticed that if a task is aborted, the order is screwed up and the resubmitted task (even though the task number remains the same), will be executed after the originally submitted tasks. This is okay for me.

------ Code (for easy reproduction):------

This leads to the following code example (with 12 entries in the task data frame, instead the original 4):

Redis DB Filling:

library(doRedis)
library(foreach)

options('redis:num'=TRUE) # needed for proper execution

REDIS_JOB_QUEUE = "jobs"

getTaskInOrder <- function ( queue , job_id , ...)
{
  ...like above
}

registerDoRedis(REDIS_JOB_QUEUE)

# filling up the data frame already in order of tasks to be executed
# otherwise the dataframe has to be sorted by taskQueuePosition
taskDF = data.frame(taskName=c("taskA","taskB","taskC","taskD","taskE","taskF","taskG","taskH","taskI","taskJ","taskK","taskL"),
       taskQueuePosition=c(1,2,3,4,5,6,7,8,9,10,11,12),
       parameter1=c("val1","val2","val3","val4","val1","val2","val3","val4","val1","val2","val3","val4"),
       parameterN=c(5,5,5,4,4,4,4,3,3,3,2,2))

foreach(currTask=iter(taskDF, by='row'), 
        .verbose = T,
        .options.redis = list(getTask = getTaskInOrder
) %dopar% {
  print(paste("Executing task: ",currTask$taskName))
  Sys.sleep(currTask$parameterN)
}

removeQueue(REDIS_JOB_QUEUE)

Worker:

library(doRedis)
REDIS_JOB_QUEUE = "jobs"

startLocalWorkers(n=1, queue=REDIS_JOB_QUEUE)

Another note: just in case you are processing long jobs, as I do, please notice a bug in redis 1.1.1 (the current version on CRAN), which leads to tasks being resubmitted (due to a timeout) despite the workers still working on them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM