Random “502 Error Bad Gateway” in Amazon Red Hat (Not Ubuntu) - Nginx + PHP-FPM

Question

First of all, I already searched for 502 error in Stackoverflow. There are a lot a threads, but the difference this time is that the error appears without a pattern and it's not in Ubuntu.

Everything works perfectly, but about once a week my site shows: 502 Bad Gateway . After this first error, every connection starts showing this message. Restarting MySQL + PHP-FPM + Nginx + Varnish doesn't work.

I have to clone this instance, and make another one, to get my site up again (It is hosted in Amazon EC2).

In Nginx log it shows these line again and again:

[error] 16773#0: *7034 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1

There are nothing in MySQL or Varnish log. But in PHP-FPM it shows theses type of line:

WARNING: [pool www] child 18978, script '/var/www/mysite.com/index.php' (request: "GET /index.php") executing too slow (10.303579 sec), logging
WARNING: [pool www] child 18978, script '/var/www/mysite.com/index.php' (request: "GET /index.php") execution timed out (16.971086 sec), terminating

Inside PHP-FPM slowlog it was showing:

[pool www] pid 20401
script_filename = /var/www/mysite.com/index.php
w3_require_once() /var/www/mysite.com/wp-content/plugins/w3-total-cache/inc/define.php:1478

(Inside the file "define.php" at line number 1478, it has this line of code: require_once $path; )

I thought the problem was with W3 Total Cache plugin. So I removed W3 Total Cache. About 5 days later it happened again with this error in PHP-FPM slow log :

script_filename = /var/www/mysite.com/index.php
wpcf7_load_modules() /var/www/mysite.com/wp-content/plugins/contact-form-7/includes/functions.php:283

(Inside the file "functions.php" at line number 283, it has this line of code: include_once $file; )

The other day, the first error occurred in another part:

script_filename = /var/www/mysite.com/wp-cron.php
curl_exec() /var/www/mysite.com/wp-includes/class-http.php:1510

And again a different part of code:

[pool www] pid 20509
script_filename = /var/www/mysite.com/index.php
mysql_query() /var/www/mysite.com/wp-includes/wp-db.php:1655

CPU, RAM ... everything is stable when this error occurs (less then 20% usage).

I tried everything, but nothing worked:

Moved to a better server (CPU and RAM)
Decreased timeout from Nginx, PHP-FPM, MySQL (my page loads quickly, so I decrease timeout to kill any outlier process)
Changed the number of PHP-FPM spare servers
Changed a lot of configuration from Nginx and PHP-FPM
I know that there is a bug with PHP-FPM and Ubuntu that could cause this error. But I don't think there is a bug with Amazon instances (Red Hat). (And I don't want to migrate from PHP-FPM to Socks because I've read that Socks don't works well under heavy load)

This was happening about every week since 5 months ago. I'm desperate. I got to the point that I even put Nginx and PHP-FPM in Linux's crontab, to restart theses services every day. But it didn't work too.

Anyone has any suggestion where I can solve this problem? Anything will help!!

Server:

 Amazon c3.large (2 core and 3.75GB RAM)
 Linux Amazon Red Hat 4.8.2 64bits

PHP-FPM:

 listen = 127.0.0.1:9000
 listen.allowed_clients = 127.0.0.1
 listen.mode = 0664
 pm = ondemand
 pm.max_children = 480
 pm.start_servers = 140
 pm.min_spare_servers =140
 pm.max_spare_servers = 250
 pm.max_requests = 50
 request_terminate_timeout = 15s
 request_slowlog_timeout = 10s
 php_admin_flag[log_errors] = on

Nginx:

  worker_processes  2;
     events {
          worker_connections  2048;
          multi_accept        on;
          use                 epoll;
     }
     http {
          include       /etc/nginx/mime.types;
          default_type  application/octet-stream;
          access_log  off;
          sendfile        on;
          tcp_nopush     on;
          tcp_nodelay    on;
          types_hash_max_size 2048;
          server_tokens off;
          client_max_body_size 8m;
          reset_timedout_connection on;
          index index.php index.html index.htm;
          keepalive_timeout  1;
          proxy_connect_timeout  30s;
          proxy_send_timeout  30s;
          proxy_read_timeout  30s;
          fastcgi_send_timeout 30s;
          fastcgi_read_timeout 30s;
          listen 127.0.0.1:8080;
          location ~ .php$ {
                    try_files $uri =404;
                    include fastcgi_params;
                    fastcgi_index index.php;
                    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
                    fastcgi_keep_conn on;
                    fastcgi_pass 127.0.0.1:9000;
                    fastcgi_param HTTP_HOST $host;
          }
     }

Answer 1

I would start by tuning some configuration parameters.

PHP-FPM

I think that your pm values are somewhat off, a bit higher than I've normally seen configured on server around your specs... but you say that memory consumption it's normal so that's kind of weird.

Anyway... for pm.max_children = 480 , considering that by default WordPress increases the memory limit to 40MB, you would end up using up to 18 gigs of memory, so you definitely would like to lower that.

Check the fourth part on this post for more info about that: http://www.if-not-true-then-false.com/2011/nginx-and-php-fpm-configuration-and-optimizing-tips-and-tricks/

If you're using... let's say 512MB for nginx, MySQL, Varnish and other services, you would have about 3328 MB for php-fpm... divided by 40 MBs per process, pm.max_children should be about 80... but even 80 it's very high.

It's probable that you can also lower the values of pm.start_servers , pm.min_spare_servers and pm.max_spare_servers . I prefer to keep them low and only increase them it's necessary

For pm.max_requests you should keep the default of 500 to avoid server respawns. I think it's only advisable to lower it if you suspect memory leaks.

Nginx

Change keepalive_timeout to 60 to make better use of keep alive.

Other than that, I think everything looks normal.

I had this issue with Ubuntu, but request_terminate_timeout on PHP-FPM and fastcgi_send_timeout + fastcgi_read_timeout were enough to get rid of it.

I hope you can fix it!

Random “502 Error Bad Gateway” in Amazon Red Hat (Not Ubuntu) - Nginx + PHP-FPM

Question

1 answers

solution1
0 2015-04-17 02:10:13

PHP-FPM

Nginx

Random “502 Error Bad Gateway” in Amazon Red Hat (Not Ubuntu) - Nginx + PHP-FPM

Question

1 answers

solution1 0 2015-04-17 02:10:13

PHP-FPM

Nginx

solution1
0 2015-04-17 02:10:13