AWS HTTP API 网关 503 服务不可用

Question

I have an HTTP API Gateway with a HTTP Integration backend server on EC2.我在 EC2 上有一个 HTTP 集成后端服务器的 HTTP API 网关。 The API has lots of queries during the day and looking at the logs i realized that the API is returning sometimes a 503 HTTP Code with a body: API 白天有很多查询，查看日志我意识到 API 有时会返回带有正文的 503 HTTP 代码：

{ "message": "Service Unavailable" }

When i found out this, i tried the API and running the HTTP requests many times on Postman, when i try twenty times i get at least one 503.当我发现这一点时，我尝试了 API 并在 Postman 上多次运行 HTTP 请求，当我尝试二十次时，我至少得到一个 503。

I then thought that the HTTP Integration Server was busy but the server is not loaded and i tried going directly to the HTTP Integration Server and i get 200 responses all the times.然后我认为 HTTP 集成服务器很忙，但服务器未加载，我尝试直接访问 HTTP 集成服务器，我一直收到 200 个响应。

The timeout parameter is set to 30000ms and the endpoint average response time is 200ms so timeout is not a problem.超时参数设置为 30000 毫秒，端点平均响应时间为 200 毫秒，因此超时不是问题。 Also the HTTP 503 is not after 30 seconds of the request but instantly.此外，HTTP 503 不是在请求的 30 秒后而是立即。

Can anyone help me?谁能帮我？

Thanks谢谢

Answer 1

I solved this issue by editing the keep-alive connection parameters of my internal integration server.我通过编辑内部集成服务器的 keep-alive 连接参数解决了这个问题。 The AWS API Gateway needs the keep alive parameters on a standard configuration, so I started tweaking my NGINX server parameters until I solved the issue. AWS API 网关需要标准配置上的 keep alive 参数，所以我开始调整我的 NGINX 服务器参数，直到我解决了这个问题。

Answer 2

Had the same issue on a selfmade Microservice with Node that was integrated into AWS API-Gateway.在集成到 AWS API-Gateway 的带有 Node 的自制微服务上遇到了同样的问题。 After some reconfiguration of the Cloudwatch-Logs I got further indicator on what is wrong: INTEGRATION_NETWORK_FAILURE在对 Cloudwatch-Logs 进行了一些重新配置后，我进一步了解了问题所在： INTEGRATION_NETWORK_FAILURE

Verify your problem is alike - ie through elaborated log output验证您的问题是否相似 - 即通过详细日志 output

In API-Gateway - Logging add more output in "Log format"在 API-Gateway - Logging 中以“日志格式”添加更多 output Use this or similar content for "Log format":将此或类似内容用于“日志格式”：

{"httpMethod":"$context.httpMethod","integrationErrorMessage":"$context.integrationErrorMessage","protocol":"$context.protocol","requestId":"$context.requestId","requestTime":"$context.requestTime","resourcePath":"$context.resourcePath","responseLength":"$context.responseLength","routeKey":"$context.routeKey","sourceIp":"$context.identity.sourceIp","status":"$context.status","errMsg":"$context.error.message","errType":"$context.error.responseType","intError":"$context.integration.error","intIntStatus":"$context.integration.integrationStatus","intLat":"$context.integration.latency","intReqID":"$context.integration.requestId","intStatus":"$context.integration.status"}

After using API-Gateway Endpoint and failing consult the logs again - should be looking like that:使用 API-Gateway Endpoint 并再次查询日志失败后 - 应该是这样的：

Solve in NodeJS Microservice (using Express)在 NodeJS 微服务中解决（使用 Express）

Add timeouts for headers and keep-alive on express servers socket configuration when upon listening.在侦听时为快速服务器套接字配置添加标头超时和保持活动。

const app = require('express')();

// if not already set and required to advertise the keep-alive through HTTP-Response you might want to use this
/* 
app.use((req: Request, res: Response, next: NextFunction) => {
    res.setHeader('Connection', 'keep-alive');
    res.setHeader('Keep-Alive', 'timeout=30');
    next();
});
*/

/* ..you r main logic.. */

const server = app.listen(8080, 'localhost', () => {
    console.warn(`⚡️[server]: Server is running at http://localhost:8080`);
});

server.keepAliveTimeout = 30 * 1000; // <- important lines
server.headersTimeout = 35 * 1000; // <- important lines

Reason原因

Some AWS Components seem to demand a connection kept alive - even if server responding otherwise ( connection: close ).一些 AWS 组件似乎要求连接保持活动状态 - 即使服务器以其他方式响应（ connection: close ）。 Upon reusage in API Gateway (and possibly AWS ELBs ) the recycling will fail because other-side most likely already closed hence the assumed "NETWORK-FAILURE".在 API 网关（可能还有AWS ELB ）中重复使用后，回收将失败，因为另一端很可能已经关闭，因此假定为“网络故障”。

This error seems intermittent - since at least the API-Gateway seems to close unused connections after a while providing a clean execution the next time.这个错误似乎是间歇性的——因为至少 API-Gateway 似乎会在一段时间后关闭未使用的连接，从而在下次提供干净的执行。 I can only assume they do that for high-performance and not divert to anything less.我只能假设他们这样做是为了高性能，而不是转移到任何更少的东西上。

AWS HTTP API 网关 503 服务不可用

问题描述

2 个解决方案

解决方案1
0 2021-11-05 02:35:12

解决方案2
0 2022-08-29 13:10:16

Verify your problem is alike - ie through elaborated log output验证您的问题是否相似 - 即通过详细日志 output

Solve in NodeJS Microservice (using Express)在 NodeJS 微服务中解决（使用 Express）

Reason原因

AWS HTTP API 网关 503 服务不可用

问题描述

2 个解决方案

解决方案1 0 2021-11-05 02:35:12

解决方案2 0 2022-08-29 13:10:16

Verify your problem is alike - ie through elaborated log output验证您的问题是否相似 - 即通过详细日志 output

Solve in NodeJS Microservice (using Express)在 NodeJS 微服务中解决（使用 Express）

Reason原因

解决方案1
0 2021-11-05 02:35:12

解决方案2
0 2022-08-29 13:10:16