简体   繁体   中英

Django RF and Gunicorn - extrange behavior in response time

SETUP: Azure cloud Virtual M. nº1: (Debian 8 cores) Docker with application (API) Django Rest with Gunicorn (17 sync workers)

Virtual M nº2: (Debian 4 cores) DB MySQL

Note: The APi receives a user call from a user, makes 4 django ORM queries to MySQL (Only selects) and returns OK or error.

We are launching 220 requests per second from an Apache JMeter for an unlimited time. These calls are 100% successful and take an average of 1200 ms, We control the workload of the 8 CPUs and they work at + -50%

Until here everything is perfect, according to our requirements.

Trouble:

After an X Time and suddenly, the response time almost triples reaching 4000 ms on average. and the CPUs go to 100% (htop showing half of each bar red) (Calls are still 100% successful)

Our analysis after hundreds of structured tests:

Time X usually depends on the time that has elapsed since the last time it failed(triplication of time), normally it is half that time. Example 1: If at this moment it is failing (that is, it takes 4000 ms) and I stop the test for 2 minutes, the test starts well and fails again after a minute (approx). Example 2: If I stop it for 1 hour, it takes 30 minutes to fail. Example 3: if I stop it for 15 hours, it takes 7 hours for it to fail (approx).

The X Time does not depend on the Docker or the Servers, because if I stop the test (when it is failing) I restart the two servers and activate the test again, it will take half the time to fail since the test stops.

The MySQL when it is working well (1200 ms per call), attends 700-800 select / second and has between 5 and 7 connections, when it is failing, it drops to 180 selections and has 15 users connected approx. We tried throwing External Queries to see if the problem was that the base was blocked and it responds very quickly. We tried Synchronous and Asynchronous Workers and it show the same behavior. We swapped the Gunicorn for the UWSGI and it does the exact same thing. We try multiple gunicorn config and shows same behavior.

somewone can help me?? could be a thundering herd problem? I do not know if it is a kernel issue or I have to call an exorsist.

In the end I found the solution, it turns out that azure has a type of virtual machine: the B-series burstable, The B-series provides you with the ability to purchase a VM size with baseline performance that can build up credits when it is using less than its baseline. When the VM has accumulated credits, the VM can burst above the baseline using up to 100% of the vCPU when your application requires higher CPU performance. what I was experiencing was that credits were depleted, and what seemed like more CPU usage was actually azure limiting processing power. the solution was to migrate to a machine with stable consumption.

https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-b-series-burstable

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM