I have two Sidekiq jobs. The first loads a feed of articles in JSON and splits it into multiple jobs. It also creates a log and stores a start_time
.
class LoadFeed
include Sidekiq::Worker
def perform url
log = Log.create! start_time: Time.now, url: url
articles = load_feed(url) # this one loads the feed
articles.each do |article|
ProcessArticle.perform_async(article, log.id)
end
end
end
The second job processes an article and updates the end_time
field of the former created log to find out, how long the whole process (loading the feed, splitting it into jobs, processing the articles) took.
class ProcessArticle
include Sidekiq::Worker
def perform data, log_id
process(data)
Log.find(log_id).update_attribute(:end_time, Time.now)
end
end
But now I have some problems / questions:
Log.find(log_id).update_attribute(:end_time, Time.now)
isn't atomic, and because of the async behaviour of the jobs, this could lead to incorrectend_time
values. Is there a way to do an atomic update of adatetime
field in MySQL with the current time?- The feed can get pretty long (~ 800k articles) and updating a value 800k times when you would just need the last one seems like a lot of unnecessary work. Any ideas how to find out which one was the last job, and only update the
end_time
field in this job?
For 1) you could do an update with one less query and let MySQL find the time:
Log.where(id: log_id).update_all('end_time = now()')
For 2) one way to solve this would be to update your end time only if all articles have been processed. For example by having a boolean that you could query. This does not reduce the number of queries but would certainly have better performance.
if feed.articles.needs_processing.none?
Log.where(id: log_id).update_all('end_time = now()')
end
This is the problem Sidekiq Pro's Batch feature solves. You create a set of jobs, it calls your code when they are all complete.
class LoadFeed
include Sidekiq::Worker
def on_success(status, options)
Log.find(options['log_id']).update_attribute(:end_time, Time.now)
end
def perform url
log = Log.create! start_time: Time.now, url: url
articles = load_feed(url) # this one loads the feed
batch = Sidekiq::Batch.new
batch.on(:success, self.class, 'log_id' => log.id)
batch.jobs do
articles.each do |article|
ProcessArticle.perform_async(article, log.id)
end
end
end
end
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.