简体   繁体   English

AWS Lambda和RDS之间的间歇性超时

[英]Intermittent timeouts between AWS Lambda and RDS

We are currently experiencing what I can only describe as random intermittent timeouts between AWS Lambda and RDS. 我们目前正在体验我只能描述为AWS Lambda和RDS之间的随机间歇性超时。 After deploying our functions and running them successfully, they can randomly switch to a state of timing out with no configuration changes . 在部署我们的功能并成功运行它们之后,它们可以随机切换到超时状态而无需更改配置 Important to note, we are also monitoring the DB connections and can confirm that we aren't running into a max connection issue. 需要注意的是,我们还监视数据库连接,并确认我们没有遇到最大连接问题。

Here are the details on our setup: 以下是我们设置的详细信息:

Code being executed (using Node.JS v. 6.10): 正在执行的代码(使用Node.JS v.6.10):

const mysql = require('mysql');

exports.dbWrite = (events, context, callback) => {

   const db = mysql.createConnection({
       host: <redacted>,
       user: <redacted>,
       password: <redacted>,
       database: <redacted>
   });

   db.connect(function (err) {
       if (err) {
           console.error('error connecting: ' + err.stack);
           return;
       }

       console.log('connected !');
   });

   db.end();

};

We are using the Node.JS mysql library, v. 2.14.1. 我们使用的是Node.JS mysql库,版本2.14.1。

From a networking perspective: 从网络角度来看:

  • The Lambda function is in the same VPC as our RDS instance Lambda函数与我们的RDS实例位于相同的VPC中
  • The Lambda function has subnets assigned, which are associated with a routing table that does not have internet access (not associated with an internet gateway) Lambda函数已分配子网,这些子网与具有Internet访问权限的路由表相关联(不与Internet网关关联)
  • The RDS database is not publicly accessible. RDS数据库不可公开访问。
  • A security group has been created and associated with the Lambda function that has wide open access on all ports (for now - once DB connectivity is reliable, that will change). 已创建安全组并与Lambda函数关联,该函数在所有端口上具有广泛的开放访问权限(目前 - 一旦数据库连接可靠,将发生变化)。
  • The above security group has been whitelisted on port 3306 within a security group associated with the RDS instance. 上述安全组已在与RDS实例关联的安全组内的端口3306上列入白名单。

CloudWatch error: CloudWatch错误:

{
  "errorMessage": "connect ETIMEDOUT",
  "errorType": "Error",
  "stackTrace": [
    "Connection._handleConnectTimeout 
     (/var/task/node_modules/mysql/lib/Connection.js:419:13)",
     "Socket.g (events.js:292:16)",
     "emitNone (events.js:86:13)",
     "Socket.emit (events.js:185:7)",
     "Socket._onTimeout (net.js:338:8)",
     "ontimeout (timers.js:386:14)",
     "tryOnTimeout (timers.js:250:5)",
     "Timer.listOnTimeout (timers.js:214:5)",
     "    --------------------",
     "Protocol._enqueue                                     
      (/var/task/node_modules/mysql/lib/protocol/Protocol.js:145:48)",
     "Protocol.handshake 
      (/var/task/node_modules/mysql/lib/protocol/Protocol.js:52:23)",
     "Connection.connect 
      (/var/task/node_modules/mysql/lib/Connection.js:130:18)",
     "Connection._implyConnect 
      (/var/task/node_modules/mysql/lib/Connection.js:461:10)",
     "Connection.query 
      (/var/task/node_modules/mysql/lib/Connection.js:206:8)",
     "/var/task/db-write-lambda.js:52:12",
     "getOrCreateEventTypeId (/var/task/db-write-lambda.js:51:12)",
     "exports.dbWrite (/var/task/db-write-lambda.js:26:9)"
   ]
 }

Amongst the references already reviewed: 在已审查的参考文献中:

In summary, the fact that these timeouts are intermittent makes this an issue that is totally confusing. 总之,这些超时是间歇性的这一事实使得这个问题完全令人困惑。 AWS support has stated that NodeJS-mysql is a third-party tool, and is technically not supported, but I know folks are using this technique. AWS支持声称NodeJS-mysql是第三方工具,从技术上讲不支持,但我知道人们正在使用这种技术。

Any help is greatly appreciated! 任何帮助是极大的赞赏!

Considering that the RDS connections are not exhausted, there is a possibility that the lambda running into a particular subnet is always failing to connect to db. 考虑到RDS连接没有耗尽,运行到特定子网的lambda有可能始终无法连接到db。 I am assuming that the RDS instances and lambdas are running in separate subnets. 我假设RDS实例和lambda在不同的子网中运行。 One way to investigate this is to check flow logs. 调查此问题的一种方法是检查流日志。

Go to EC2 -> Network interfaces -> search for lambda name -> copy eni ref and then go to VPC -> Subnets -> select the subnet of lambda -> Flow Logs -> search by eni ref. 转到EC2 - >网络接口 - >搜索lambda名称 - >复制eni ref然后转到VPC - >子网 - >选择lambda的子网 - > Flow Logs - >按eni ref搜索。

If you see "REJECT OK" in your flow logs for your db port means that there is missing config in Network ACLs. 如果在数据库端口的流日志中看到“REJECT OK”,则表示网络ACL中缺少配置。

Updating this issue: It turns out that the issue was related to the fact that the database connection was being made within the handler! 更新此问题:事实证明,问题与数据库连接是在处理程序内进行的事实有关! Due to the asynchronous nature of Lambda and Node, this was the culprit for the intermittent timeouts. 由于Lambda和Node的异步性质,这是间歇性超时的罪魁祸首。

Here's the revised code: 这是修改后的代码:

const mysql = require('mysql');
const database = getConnection();

exports.dbWrite = (events, context, callback) => {

   database.connect(function (err) {
     if (err) {
          console.error('error connecting: ' + err.stack);
          return;
     }

     console.log('connected !');
});

db.end();


function getConnection() {
   let db = mysql.createConnection({
       host: process.env.DB_HOST,
       user: process.env.DB_USER,
       password: process.env.DB_PASS,
       database: process.env.DB_NAME
   });

   console.log('Host: ' + process.env.DB_HOST);
   console.log('User: ' + process.env.DB_USER);
   console.log('Database: ' + process.env.DB_NAME);

   console.log('Connecting to ' + process.env.DB_HOST + '...');

   return db;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM