具有高峰值数据库连接的 AWS LAMP 环境

Question

I have a Wordpress/Woocommerce site that I migrated to a new environment.我有一个迁移到新环境的 Wordpress/Woocommerce 网站。 I upgrade my ec2 instance from a legacy m4.10xlarge to a new m5.8xlarge.我将我的 ec2 实例从旧版 m4.10xlarge 升级到新的 m5.8xlarge。 Some major differences were the old machine was on an old legacy Linux 1 and php 7.2 and the new machine is on Linux 2 with php 7.4 and I made a copy of my database which is on Amazon RDS upgrading it from mysql 5.6 to 5.7. Some major differences were the old machine was on an old legacy Linux 1 and php 7.2 and the new machine is on Linux 2 with php 7.4 and I made a copy of my database which is on Amazon RDS upgrading it from mysql 5.6 to 5.7. The instance is behind a load balancer which I changed from an old classic load balancer to a new application load balancer.该实例位于负载均衡器后面，我将其从旧的经典负载均衡器更改为新的应用程序负载均衡器。

This environment is working except now the load balancer has a very high connection count and the RDS has spiking database connections.除了现在负载均衡器具有非常高的连接计数并且 RDS 具有峰值数据库连接之外，此环境正在运行。 Sometimes they will be from 10 DB connections and spike randomly to 200-400 and then drop back down.有时它们会来自 10 个 DB 连接并随机飙升至 200-400，然后回落。 During this time the site runs extremely slow and sometimes certain pages will 504 gateway timeout.在此期间，网站运行速度极慢，有时某些页面会出现 504 网关超时。

This behavior definitely did not exist on my old environment and I have gone through a lot of steps to try and resolve this issue.这种行为在我的旧环境中绝对不存在，我已经经历了很多步骤来尝试解决这个问题。 Ususally my old RDS DB connections would hover around 20 on average.通常我的旧 RDS DB 连接平均会在 20 左右 hover。 I have spent many hours on the phone with Amazon technical support but they just tell me to speak to different teams and it just goes in circles and ends up with no result.我在亚马逊技术支持部门的电话上花了很多时间，但他们只是告诉我要与不同的团队交谈，结果只是绕了个圈子，最终没有结果。

I have tried tweaking /etc/httpd/conf/httpd.conf file setting certain values that I read or were suggested to me such as:我尝试调整/etc/httpd/conf/httpd.conf文件设置我阅读或建议给我的某些值，例如：

KeepAlive On
KeepAliveTimeout 5
MaxKeepAliveRequests 500
TimeOut 300
AcceptFilter http none
AcceptFilter https none

<IfModule mpm_prefork_module>
      StartServers           300
      MinSpareServers        50
      MaxSpareServers        100
      ServerLimit            1000
      MaxRequestWorkers      1000
      MaxConnectionsPerChild 10000
</IfModule>

I have tried tweaking them but to no avail.我曾尝试调整它们，但无济于事。 The connections still spike.连接仍然激增。 I have tried setting values on my RDS parameter group to limit the connections like wait_timeout = 10 interactive_timeout = 60 and net_write_timeout = 60我尝试在我的 RDS 参数组上设置值以限制连接，例如wait_timeout = 10 interactive_timeout = 60 和net_write_timeout = 60

I even tried to switch from prefork to event module and use php-fpm and fast cgi.我什至尝试从 prefork 切换到事件模块并使用 php-fpm 和快速 cgi。 Whenever I switched to using this my web pages would rarely work and 50% of the time get a 504 gateway timeout error so I reverted back to using the prefork module.每当我切换到使用它时，我的 web 页面将很少工作，并且 50% 的时间会出现 504 网关超时错误，因此我恢复使用 prefork 模块。

The last group of settings I tried tuning were some TCP network values in the /etc/sysctl.conf file我尝试调整的最后一组设置是/etc/sysctl.conf文件中的一些 TCP 网络值

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_fin_timeout = 30

# Protect Against TCP Time-Wait
net.ipv4.tcp_rfc1337 = 1

# Decrease the time default value for connections to keep alive
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 60
net.ipv4.tcp_keepalive_intvl = 20

#Increase TCP max buffer size setable using setsockopt():
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432

#Increase Linux autotuning TCP buffer limits min, default, and max number of bytes to use set max to 16MB for 1GE, and 32M or 54M for 10GE:
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432

#Determines how much packet processing can be spent among all NAPI structures registered to a CPU
net.core.netdev_budget = 600

#Increased the number of incoming connections backlog queue. This queue sets the maximum number of packets, queued on the INPUT side
net.core.netdev_max_backlog = 3000000

#Increased the limit of the socket listen() backlog, the maximum value that net.ipv4.tcp_max_syn_backlog can take
net.core.somaxconn = 1000000

Nothing I have tried has reduced these spikes in the database and high connections on the load balancer.我尝试过的任何方法都没有减少数据库中的这些峰值和负载均衡器上的高连接。 When I run top on my server I see sometimes very normal and low load average and then it will get huge going way over my 32 cores on the new machine.当我在我的服务器上运行top时，我有时会看到非常正常且平均负载较低的情况，然后它会在新机器上的 32 个内核上变得巨大。 I've seen the load average as high as 150 before and then it drops back down.我之前看到平均负载高达 150，然后又回落。

None of my tweaks or tuning have resulted in anything I can notice looking at nestat top .我的任何调整或调整都没有导致我在查看nestat top时注意到的任何内容。 The results are still the same and the behavior never changes.结果仍然相同，行为永远不会改变。

If anyone has any idea of what I could try or look into next or any advice at all that would be greatly appreciated如果有人知道我接下来可以尝试或研究什么或任何建议，将不胜感激

Answer 1

How often does that load balancer check the activity on each server?该负载均衡器多久检查一次每台服务器上的活动？ What is the average response time for your app?您的应用程序的平均响应时间是多少？ If the former is longer than the latter, then the balancer is causing the problem.如果前者比后者长，那么平衡器就会导致问题。

How many servers are you balancing among?你平衡了多少台服务器？

Beg for "round robin", not something "smart".乞求“循环”，而不是“聪明”。 It is better for low-latency apps.它更适合低延迟的应用程序。

If, on the other hand, your app is taking 10 seconds or more for any query, then you need to pursue that.另一方面，如果您的应用程序需要 10 秒或更长的时间进行任何查询，那么您需要追求这一点。

具有高峰值数据库连接的 AWS LAMP 环境

问题描述

1 个解决方案

解决方案1
1 2021-06-03 00:40:12

具有高峰值数据库连接的 AWS LAMP 环境

问题描述

1 个解决方案

解决方案1 1 2021-06-03 00:40:12

解决方案1
1 2021-06-03 00:40:12