简体   繁体   中英

AWS: how to send data from AWS Lambda to on-premises application

I am trying to send data (>10MB potentially) from an AWS Lambda function to an on-premises CDAP application that is writing to an on-premises data store.

I know I can use a REST interface on the on-prem app for the Lambda to make calls to, but I am wondering if it is possible to use a messaging system to integrate the on-prem resource with the AWS Lambdas (ie, Lambda writes to a Kafka topic that the on-prem application can read from).

I don't know what the best practices are for doing this or if it has been done. I would like to figure out what the different options are for doing this.

Thanks in advance for the help.

** We were running into issues with Kafka's 10MB limit on message sizes in our on-prem solution.

Updated answer to account for OP's preference for Kafka and to work around the 10MB limit:

Make Lambda send a message to Kafka.

To do this, you can either:

  1. Make your Kafka instance available outside your network so that Lambda can access it. Or,
  2. Put Lambda in a VPC and connect the VPC to your internal network (if direct connection is not set up).

To work around the 10MB limit, split the entire data (more than 10MB), into smaller chunks and send multiple messages to Kafka. Then, if necessary, handle the joining of the chunks in your application.


Original answer: My recommendation would be:

Make your Lambda write to an SNS topic which the on-prem application can subscribe to.

This is because this is the easiest solution to implement. SNS might not be the best option for your application though.

Other options are:

  1. Same as above but use Kinesis instead of SNS . The decision on whether to use SNS or Kinesis will depend on your application's needs.

  2. Run your Lambda in a VPC and connect your VPC to your VPN . This will let your lambda access the resources (like a Kafka instance) in your private network.

You can use AWS SNS (Push) or AWS SQS (Pull) depending on the scale of the load for your AWS Lambda functions instead of maintaining a Apache Kafka cluster.

  • Use SQS if the scale is higher or you don't have streaming or queueing capabilities in your on-premise infrastructure to handle the load or if you don't have redundancy in your on-premise resources, still go with SQS (Fully managed Queue service). When using SQS you can use the SQS SDKs from your On-Premise environment to call SQS with relevant permissions with IAM.
  • If there are multiple resources in your environment which needs to be triggered based on Lambda execution and you have required infrastructure setup to handle higher scale, go with SNS(Fully managed Pub-Sub messaging service). When using SNS, you can use HTTP trigger to call the On-Premise resources.

Since both SQS or SNS won't support a message size of 10MB, after each execution, you can push the 10MB data to AWS S3 where the bucket is configured with events to send a notification to SQS or SNS Topic. Your On-Premise resources can read the message either from SQS and SNS and download the file(With 10MB data) from S3.

From AWS Lambda publish to an AWS hosted Apache Kafka cluster using the Confluent REST Proxy. This could even be a hosted service like Confluent Cloud which runs in AWS or it could be a Kafka cluster in your own VPC. Then you can replicate the data from your AWS Kafka cluster to the on-prem cluster in several ways including Mirror Maker, Confluent Replicator, another HTTPS or WSS Proxy, etc. it should be a “pull” from the on-prem side and tunnel over SSL/TLS or it won't transition most client-side firewalls.

There is no hard 10 MB limit to Kafka messages. Max message size is a configurable parameter. However, it is a best practice to keep message sizes below 10MB or even 1MB which is the default max size value setting. For larger messages you typically either compress them, or break them into a sequence of smaller messages (with a common key so they stay in order and go to the same partition), or you store the large message in S3 or another external store and then publish a reference to the storage location so the consumer can retrieve it out of band from Kafka.

Step #1 -> Create a stream in CDAP 
Step #2 -> Push the data to stream using REST call from your Lambda function
Step #3 -> Create the pipeline in CDAP
Step #4 -> make source as stream and sink as Database

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM