简体   繁体   中英

Background job taking twice the time that the same operation within rails

In my Rails application, I have a long calculation requiring a lot of database access.

To make it short, my calculation took 25 seconds.

When implementing the same calculation within a background job (a big single worker), the same calculation take twice the same time (ie 50 seconds). I have try several technics to put the job in a background process put none add an impact on my performances => using DelayJob / Sidekiq / doing the process within my rails but in a thread created for the work, but all have the same impact on my performances *2.

This performance difference only exist in rails 'production' environment. It looks like there is an optimisation done by rails that is not done in my background job.

My technical environment is the following =>

  • I am using ruby 2.0 / rails 4
  • I am using unicorn (but I have same problem without it).
  • The job is using Rails.cache to store some partial computation.
  • I am using postgresql

Does anybody has an clue where this impact might come from ?

I'm assuming you're comparing the background job speed to the speed of running the operation during a web request? If so, you're likely benefiting from Rails's QueryCache , which caches db queries during a web request. Try disabling it like described here:

Disabling Rails SQL query caching globally

If that causes the web request version of the job to take as long as the background job, you've found your culprit. You can then enable the query cache on your background job to speed it up (if it makes sense for your application).

Background job is not something that need to used for speed-up things. It's main meaning is to 'fire and forget' and remove 25 seconds of calculating synchronously and adding some more of calculating asynchronously. So you can give user response that she's request is processing and return with calculation later.

You may take speed gain from background job by splitting big task on some small and running them at same time. In your case I think it's something impossible to use, because of dependency of operations in yours calculation.

So if you want to speed you calculation, you need to look into denormalization of your data structure, storing some calculated values for your big calculation on moment when source data for this calculation updated. So you will calculate less on user request for results and more on data storage. And it's good place for use background job. So you finish your update of data, create background task for update caches. And if user request for calculation comes before this task is finished you will still need to wait for cache fill-up.

Update: I think I am still need to answer your main question. So basically this additional time on background task processing is comes from implementation. Because of 'fire and forget' approach no one need that background task scheduler will consume big amount of processor time just monitoring for new jobs. I am not sure completely but think that if your calculation will be two times more complex, time gain will be same 25 seconds.

My guess is that the extra time is coming from the need for your background worker to load rails and all of your application. My clue is that you said the difference was greatest with Rails in production mode. In production mode, subsequent calls to the app make use of the app and class cache.

How to check this hypotheses:

Change your background job to do the following:

  1. print a log message before you initiate the worker
  2. start the worker
  3. run your calculation. As part of your calculation startup, print a log message
  4. print another log message
  5. run your calculation again
  6. print another log message

Then compare the two times for running your calculation.

Of course, you'll also gain some extra time benefits from database caching, code might remain resident in memory, etc. But if the second run is much much faster, then the fact that the second run didn't restart Rails is more significant.

Also, the time between the log message from steps 1 and 3 will also help you understand the start up times.

Fixes

Why wait? Most important: why do you need the results faster? Eg, tell your user that the result will be emailed to them after it is calculated. Or let your user see that the calculation is proceeding in the background, and later, show them the result.

The key for any long running calculation is to do it in the background and encourage the user to not wait for the result. They should be able to do something else until they get the result.

Start the calculation automatically As soon as the user logs in, or after they do something interesting, start the calculation. That way, when (and if) the user asks for the calculation, the answer will either be already done or will soon be done.

Cache the result and bust the cache as needed Similar to the above, start the calculation periodically and automatically. If the user changes some data, then restart the calculation by busting the cache. There are also ways to halt any on-going calculation if data is changed during the calculation.

Pre-calculate part of the calculation Why are you taking 25 seconds or more for a dbms calculation? Could be that you should change the calculation. Investigate adding indexes, summary tables, de-normalizing, splitting the calculation into smaller steps that can be pre-calculated, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM