简体   繁体   English

AWS 负载均衡器 502 错误网关

[英]AWS Load Balancer 502 Bad Gateway

I have microservices written in node/express hosted on EC2 with an application load balancer.我有用应用程序负载均衡器托管在 EC2 上的 node/express 编写的微服务。

Some users are getting a 502 even before the request reaches the server.一些用户甚至在请求到达服务器之前就收到了 502。

I register every log inside each instance, and I don't have the logs of those requests, I have the request immediately before the 502, and the requests right after the 502, that's why I am assuming that the request never reaches the servers.我在每个实例中注册每个日志,我没有这些请求的日志,我在 502 之前有请求,在 502 之后有请求,这就是为什么我假设请求永远不会到达服务器的原因。 Most users solve this by refreshing the page or using an anonymous tab, which makes the connection to a different machine (we have 6).大多数用户通过刷新页面或使用匿名选项卡来解决这个问题,这会连接到不同的机器(我们有 6 个)。

I can tell from the load balancer logs that the load balancer responds almost immediately to the request with 502. I guess that this could be a TCP RST.我可以从负载均衡器日志中看出,负载均衡器几乎立即以 502 响应请求。我猜这可能是 TCP RST。

I had a similar problem a long time ago, and I had to add keepAliveTimeout and headersTimeout to the node configuration.很久以前也遇到过类似的问题,不得不在节点配置中加入keepAliveTimeoutheadersTimeout Here are my settings (still using the LB default of the 60s):这是我的设置(仍然使用 60 年代的 LB 默认值):

server.keepAliveTimeout = 65000;
server.headersTimeout = 80000;

The metrics, especially memory and CPU usage of all instances are fine.所有实例的指标,尤其是 memory 和 CPU 使用率都很好。

These 502 errors started after an update we made where we introduced several packages, for instance, axios.这些 502 错误是在我们进行更新后开始的,我们在其中引入了几个包,例如 axios。 At first, I thought it could be axios, because the keep-alive is not enabled by default.一开始我以为可能是axios,因为默认没有开启keep-alive。 But it didn't work.但它没有用。 Other than the axios , we just use the request .除了axios 之外,我们只使用请求

Any tips on how should I debug/fix this issue?关于我应该如何调试/修复此问题的任何提示?

HTTP 502 errors are usually caused by a problem with the load balancer. HTTP 502 错误通常是由负载平衡器的问题引起的。 Which would explain why the requests are never reaching your server, presumably because the load balancer can't reach the server for some or other reason.这可以解释为什么请求永远不会到达您的服务器,大概是因为负载均衡器由于某种或其他原因无法到达服务器。

This link has some hints regarding how to get logs from a classic load balancer. 这个链接有一些关于如何从经典负载均衡器获取日志的提示。 However, since you didn't specify, you might be using an application load balancer, in which case this link might be more useful.但是,由于您没有指定,您可能正在使用应用程序负载均衡器,在这种情况下, 此链接可能更有用。

From the ALB access logs I knew that either the ALB couldn't connect the target or the connection was being immediately terminated by the target.从 ALB 访问日志中,我知道 ALB 无法连接目标,或者连接正在被目标立即终止。

And the most difficult part was figure out how to replicate the 502 error.最困难的部分是弄清楚如何复制 502 错误。

It looks like the node version I was using has a request header size limit of 8kb.看起来我使用的节点版本的请求标头大小限制为 8kb。 If any request exceeded that limit, the target would reject the connection, and the ALB would return a 502 error.如果任何请求超过该限制,目标将拒绝连接,并且 ALB 将返回 502 错误。

Solution:解决方案:

I solved the issue by adding --max-http-header-size=size to the node start command line, where size is a value greater than 8kb.我通过在节点启动命令行中添加--max-http-header-size=size解决了这个问题,其中 size 是一个大于 8kb 的值。

A few common reasons for an AWS Load Balancer 502 Bad Gateway: AWS 负载均衡器 502 错误网关的几个常见原因:

  1. Be sure to have your public subnets (that your ALB is targeting) are set to auto-assign a public IP (so that instances deployed are auto-assigned a public IP).确保将您的公有子网(您的 ALB 的目标)设置为自动分配公有 IP(以便为部署的实例自动分配公有 IP)。
  2. Security group for your alb allows http and/or https traffic from the IPs that you are connecting from.您的 alb 的安全组允许来自您连接的 IP 的 http 和/或 https 流量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM