unique jobs with kue for node.js

Question

I would like that the jobs.create fails if an identical job is already in the system. Is there any way to acomplish this?

I need to run the same job every 24 hours, but some jobs could take even more than 24 hours, so I need to be sure that the job isn't already in the system (active, queued o failed) before adding it.

UPDATED : Ok, I going to simplify the problem to be able to explain it here. Lest say I have an analytics service and I have to send a report to my users once a day. Completing these reports some times(just a few cases but it is a possibility) take several hours even more than a day.

I need a way to know which are the currently running jobs to avoid duplicated jobs. I couldn't find anything in the ´´´´kue´´´´ API to know which jobs are currently running. Also I need some kind of event fired when more jobs are needed and then call my getMoreJobs producer.

Maybe my approach is wrong, if so please let me know a better way to solve my problem.

This is my simplified code:

var kue = require('kue'),   
    cluster = require('cluster'),
    numCPUs = require('os').cpus().length;

numCPUs = CONFIG.sync.workers || numCPUs; 

var jobs = kue.createQueue();

if (cluster.isMaster) {
    console.log('Starting master pid:' + process.pid);
    jobs.on('job complete', function(id){
    kue.Job.get(id, function(err, job){
        if (err || !job) return;
        job.remove(function(err){
            if (err) throw err;
            console.log('removed completed job #%d', job.id);
        });
    });

    function getMoreJobs() {
        console.log('looking for more jobs...');
        getOutdateReports(function (err, reports) {
            if (err) return setTimeout(getMoreJobs, 5 * 60 * 60 * 1000);

            reports.forEach(function(report) {
                jobs.create('reports', {
                    id: report.id,
                    title: report.name,
                    params: report.params
                }).attempts(5).save();
            });

            setTimeout(getMoreJobs, 60 * 60 * 1000);
        });
    }

    //Create the jobs
    getMoreJobs();

    console.log('Starting ', numCPUs, ' workers');
    for (var i = 0; i < numCPUs; i++) {
        cluster.fork();
    }

    cluster.on('death', function(worker) {
        console.log('worker pid:' + worker.pid + ' died!'.bold.red);
    });

} else {
    //Process the jobs
    console.log('Starting worker pid:' + process.pid);
    jobs.process('reports', 20, function(job, done){
        //completing my work here
        veryHardWorkGeneratingReports(function(err) {
            if (err) return done(err);
            return done();
        });
    });
}

Answer 1

The answer to one of your questions is that Kue puts the jobs that it pops off of the redis queue into "active", and you'll never get them unless you look for them.

The answer to the other question is that your distributed work queue is the consumer, not the producer of tasks. Mingling them like you have is okay, but, it's a muddy paradigm. What I've done with Kue is to make a wrapper for kue's json api, so that a job can be put into the queue from anywhere in the system. Since you seem to have a need to shovel jobs in, I suggesting writing a separate producer application that does nothing but get external jobs and stick them into your Kue work queue. It can monitor the work queue for when jobs are running low and load a batch in, or, what I would do, is make it shovel jobs in as fast as it can, and spool up multiple instances of your consumer application to process the load more quickly.

To re-iterate: Your separation of concerns isn't very good here. You should have a producer of tasks that's completely separate from your task consumer app. This gives you more flexibility, ease of scale (Just fire up another consumer on another machine and you're scaled!) and overall ease of code management. You should also allow, if possible, whomever is giving you these tasks that you "go looking for" access to your Kue server's JSON api instead of going out and finding them. The job producer can schedule its own tasks with Kue.

Answer 2

Look at https//github.com/LearnBoost/kue .

In json.js script check rows 64-112. There you'll find methods which return an object containing jobs, also filtered with type, state or id-range. ( jobRange() , jobStateRange() , jobTypeRange() .)

Scrolling down the main page to JSON API -section, you'll find the examples of the returned objects.

That how to call and use those methods you know much better than I do.

jobs.create() will fail, if you pass an unknown keyword. I would created a function to check the current job in forEach -loop, and returns a keyword. Then just call this function instead of literal keyword in jobs.create() -parameters.

Information got through those methods in json.js, may help you create that "moreJobToDo"-event too.

unique jobs with kue for node.js

Question

2 answers

solution1
3 2012-05-08 22:17:06

solution2
2 ACCPTED 2012-01-27 14:49:47

unique jobs with kue for node.js

Question

2 answers

solution1 3 2012-05-08 22:17:06

solution2 2 ACCPTED 2012-01-27 14:49:47

solution1
3 2012-05-08 22:17:06

solution2
2 ACCPTED 2012-01-27 14:49:47