简体   繁体   中英

Sidekiq concurrency and database connections pool

Here is my problem: Each night, I have to process around 50k Background Jobs, each taking an average of 60s. Those jobs are basically calling the Facebook, Instagram and Twitter APIs to collect users' posts and save them in my DB. The jobs are processed by sidekiq.

At first, my setup was:

  • :concurrency: 5 in sidekiq.yml

  • pool: 5 in my database.yml

  • RAILS_MAX_THREADS set to 5 in my Web Server ( puma ) configuration.

My understanding is:

  • my web server ( rails s ) will use max 5 threads hence max 5 connections to my DB, which is OK as the connection pool is set to 5.

  • my sidekiq process will use 5 threads ( as the concurrency is set to 5 ), which is also OK as the connection pool is set to 5.

In order to process more jobs in the same time and reducing the global time to process all my jobs, I decided to increase the sidekiq concurrency to 25. In Production, I provisionned a Heroku Postgres Standard Database with a maximum connection of 120, to be sure I will be able to use Sidekiq concurrency.

Thus, now the setup is:

  • :concurrency: 25 in sidekiq.yml

  • pool: 25 in my database.yml

  • RAILS_MAX_THREADS set to 5 in my Web Server ( puma ) configuration.

I can see that 25 sidekiq workers are working but each Job is taking way more time (sometimes more than 40 minutes instead of 1 minute) !?

Actually, I've been doing some tests and realize that processing 50 of my Jobs with a sidekiq concurrency of 5, 10 or 25 result in the same duration. As if somehow, there was a bottleneck of 5 connections somewhere.

I have checked Sidekiq Documentation and some other posts on SO ( sidekiq - Is concurrency > 50 stable? , Scaling sidekiq network archetecture: concurrency vs processes ) but I haven't been able to solve my problem.

So I am wondering:

  • is my understanding of the rails database.yml connection pool and sidekiq concurrency right ?

  • What's the correct way to setup those parameters ?

Dropping this here in case someone else could use a quick, very general pointer:

Sometimes increasing the number of concurrent workers may not yield the expected results.

For instance, if there's a large discrepancy between the number of tasks and the number of cores, the scheduler will keep switching your tasks and there isn't really much to gain, the jobs will just take about the same or a bit more time.

Here's a link to a rather interesting read on how job scheduling works https://en.wikipedia.org/wiki/Scheduling_(computing)#Operating_system_process_scheduler_implementations

There are other aspects to consider as well, such as datastore access, are your workers using the same table(s)? Is it backed by a storage engine that locks the entire table, such as MyISAM? If that's the case, it won't matter if you have 100 workers running at the same time, and enough RAM and cores, they will all be waiting in line for whichever query is running to release the lock on the table they're all meant to be working with. This can also happen with tables using engines such as InnoDB, which doesn't lock the entire table on write but you may have different workers accessing the same rows (InnoDB uses row-level locking) or simply some large indexes that don't lock but slow down the table.

Another issue I've encountered was related to Rails (which I'm assuming you're using) taking quite a toll on RAM in some cases, so you might want to look at your memory footprint as well.

My suggestion is to turn on logging and look at the data, where do your workers spend most time at? Is it something on the network layer (unlikely), is it waiting to get access to a core? Reading/writing from your data store? Is your machine swapping?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM