简体   繁体   中英

Sidekiq: Find last job

I have two Sidekiq jobs. The first loads a feed of articles in JSON and splits it into multiple jobs. It also creates a log and stores a start_time .

class LoadFeed
  include Sidekiq::Worker

  def perform url
    log = Log.create! start_time: Time.now, url: url
    articles = load_feed(url) # this one loads the feed
    articles.each do |article|
      ProcessArticle.perform_async(article, log.id)
    end
  end
end

The second job processes an article and updates the end_time field of the former created log to find out, how long the whole process (loading the feed, splitting it into jobs, processing the articles) took.

class ProcessArticle
  include Sidekiq::Worker

  def perform data, log_id
    process(data)
    Log.find(log_id).update_attribute(:end_time, Time.now)
  end
end

But now I have some problems / questions:

  1. Log.find(log_id).update_attribute(:end_time, Time.now) isn't atomic, and because of the async behaviour of the jobs, this could lead to incorrect end_time values. Is there a way to do an atomic update of a datetime field in MySQL with the current time?
  2. The feed can get pretty long (~ 800k articles) and updating a value 800k times when you would just need the last one seems like a lot of unnecessary work. Any ideas how to find out which one was the last job, and only update the end_time field in this job?

For 1) you could do an update with one less query and let MySQL find the time:

Log.where(id: log_id).update_all('end_time = now()')

For 2) one way to solve this would be to update your end time only if all articles have been processed. For example by having a boolean that you could query. This does not reduce the number of queries but would certainly have better performance.

if feed.articles.needs_processing.none?
  Log.where(id: log_id).update_all('end_time = now()')
end

This is the problem Sidekiq Pro's Batch feature solves. You create a set of jobs, it calls your code when they are all complete.

class LoadFeed
  include Sidekiq::Worker

  def on_success(status, options)
    Log.find(options['log_id']).update_attribute(:end_time, Time.now)
  end

  def perform url
    log = Log.create! start_time: Time.now, url: url
    articles = load_feed(url) # this one loads the feed
    batch = Sidekiq::Batch.new
    batch.on(:success, self.class, 'log_id' => log.id)
    batch.jobs do
      articles.each do |article|
        ProcessArticle.perform_async(article, log.id)
      end
    end
  end
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM