简体   繁体   中英

How to manage Postgres connection in concurrent AWS lambda function?

Anybody who has experience building concurrent AWS Lambda Function with Postgres?

I have to build a lambda cron that will ingest thousands of invoices into a Postgres database. I have to call the ingestion lambda function concurrently for each invoices. The problem is, because the it is concurrent, each instance of the ingestion function will create a connection to the database. Which means, if I have a 1000 invoice to ingest, each invoice will invoke a lambda function, that will create 1000 database connection. This will exhaust the max connection that Postgres can handle. Some instance of the lambda function invoked will return an error saying that there are no more connection available.

Any tips you can give how to handle this problem?

Here are some snippets of my code:

ingestInvoiceList.js

var AWS = require('aws-sdk');
var sftp = require('ssh2-sftp-client');

var lambda = AWS.Lambda();

exports.handler = async (evenrt) => {
   ...

        let folder_contents;
        try {
            // fetch list of Zip format invoices
            folder_contents = await sftp.list(client_folder);
        } catch (err) {
            console.log(`[${client}]: ${err.toString()}`);
            throw new Error(`[${client}]: ${err.toString()}`);
        }

        let invoiceCount = 0;

        let funcName = 'ingestInvoice';


        for (let item of folder_contents) {
            if (item.type === '-') {
                let payload = JSON.stringify({
                    invoice: item.name
                });
                let params = {
                    FunctionName: funcName,
                    Payload: payload,
                   InvocationType: 'Event'
                };


                //invo9ke ingest invoice concurrently
                let result = await new Promise((resolve) => {
                    lambda.invoke(params, (err, data) => {
                        if (err) resolve(err);
                        else resolve(data);
                    });
                });

                console.log('result: ', result);

                invoiceCount++;
            }
        }
   ...
}

ingestInvoice.js

var AWS = require('aws-sdk');
var sftp = require('ssh2-sftp-client');
var DBClient = require('db.js')l

var lambda = AWS.Lambda();

exports.handler = async (evenrt) => {
   ...

   let invoice = event.invoice;
   let client = 'client name';

   let db = new DBClient();

   try {
        console.log(`[${client}]: Extracting documents from ${invoice}`);

        try {
            // get zip file from sftp server
            await sftp.fastGet(invoice, '/tmp/tmp.zip', {});
        } catch (err) {
            throw err;
        }


        let zip;
        try {
            // extract the zip file...
            zip = await new Promise((resolve, reject) => {
                fs.readFile("/tmp/tmp.zip", async function (err, data) {
                    if (err) return reject(err);

                    let unzippedData;
                    try {
                        unzippedData = await JSZip.loadAsync(data);
                    } catch (err) {
                        return reject(err);
                    }

                    return resolve(unzippedData);
                });
            });

        } catch (err) {
            throw err;
        }

        let unibillRegEx = /unibill.+\.txt/g;

        let files = [];
        zip.forEach(async (path, entry) => {
            if (unibillRegEx.exec(entry.name)) {
                files['unibillObj'] = entry;
            } else {
                files['pdfObj'] = entry;
            }
        });


        // await db.getClient().connect();
        await db.setSchema(client);
        console.log('Schema has been set.');

        let unibillStr = await files.unibillObj.async('string');

        console.log('ingesting ', files.unibillObj.name);

        //Do ingestion queries here...
        ...

        await uploadInvoiceDocsToS3(client, files);

    } catch (err) {
        console.error(err.stack);
        throw err;
    } finally {
        try {
            // console.log('Disconnecting from database...');
            // await db.endClient();
            console.log('Disconnecting from SFTP...');
            await sftp.end();
        } catch (err) {
            console.log('ERROR: ' + err.toString());
            throw err;
        }
    }
   ...
}

db.js

var { Pool } = require('pg');

module.exports = class DBClient {
    constructor() {
    this.pool = new Pool();
   }

   async setSchema(schema) {
      await this.execQuery(`SET search_path TO ${schema}`);
   }

   async execQuery(sql) {
      return await this.pool.query(sql);
   }
}

Any answer would be appreciated, thank you!

I see two ways to handle this. Ultimately it depends on how fast you want to process this data.

  1. Change the concurrency setting for you Lambda to a "Reserve Concurrency: 预留并发 .

This will allow you to limit the number of concurrent Lambda's running (see this link for more details).

  1. Change your code to queue the work to be done in an SQS queue. From there you would have to create another Lambda to be triggered by the queue and process it as needed. This Lambda could decide how much to pull off the queue at a time and it too would likely need to be limited on concurrency. But you could tune it to, for example, run for the maximum 15 minutes which may be enough to empty the queue and would not kill the DB. Or if you had, say, a max concurrency of 100 then you would process quickly without killing the DB.

First, you have to initialize your connection outside the handler, so each time your warm lambda will be executed it won't open a new one:

const db = new DBClient();

exports.handler = async (event) => {
   ...

   await db.query(...)

   ...
}

If is node-pg there is a package that keep tracks of all the idle connections, kill them if necessary and retry in case of error or sorry, too many clients already : https://github.com/MatteoGioioso/serverless-pg Any other custom implemented retry mechanism with backoff will work as well.

There is also a one for MySQL as well: https://github.com/jeremydaly/serverless-mysql

These days a good solution to consider for this problem, on AWS, is RDS Proxy , which acts as a transparent proxy between your lambda(s) and database:

Amazon RDS Proxy allows applications to pool and share connections established with the database, improving database efficiency, application scalability, and security.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM