简体繁体中英

Suitability of app with long running tasks for AWS Lambda or AWS Step Functions

原文 2017-06-05 12:08:57 2 3 amazon-web-services/ aws-lambda/ aws-step-functions

I have an application on an AWS EC2 instance that runs once daily. The application fetches some files from a web service, parses the files line by line, updates a database, updates S3 files based on changes in the database, sends notification emails to customers as well as a few other tasks.

This is a series of logical tasks that must take place in sequence, although some of the tasks can be thought of as sub-tasks that can be executed in parallel. All tasks are a combination of Perl scripts and Java programs, with a single Perl script acting as the manager that executes each in turn. Some tasks can take as long as 45 minutes to complete, and the whole process can take up to 3 hours in total.

I'd like to make this whole process serverless . My initial idea was to use AWS Lambda , whereby each task would execute as a Lambda function, until I discovered Lambda functions impose a 5 minute execution timeout . It seems like the AWS Step Functions service is actually a better fit for my use case, but my understanding is that this service is backed by Lambda, so the tasks will still have the 5 min execution limitation.

(I'm also aware that I would have to re-write my Perl scripts to a language supported by Lambda).

I assume that I can work around the execution time limit by refactoring my code into smaller functions that will guarantee to complete in under 5 minutes. In my particular situation though, this seems inefficient.

Currently the database update task processes lines from a file one at a time. For this to work with Lambda, a Lambda function would need to handle only a single line from the file (or a very small number of lines) in order to guarantee not spilling over 5 minutes execution time. This would involve opening and closing a connection with the database on every invocation of the Lambda function. Also, each line processed should result in an entry written to a file, to be stored in S3. Right now, I just keep a file handle in memory and write the file to S3 when all lines are processed, but with Lambda I would need to keep reading the file, updating it and writing it back to S3.

What I'm asking is:

Is my use case a bad fit for AWS Lambda and/or AWS Step Functions?
Have I misunderstood how these services work?
Is there another AWS service that would be a better fit for my use case?

After further research, I think AWS Batch might be a good idea .

3 answers

So to answer your questions:

1) Yeah, if you've got something that'll run for around 45 minutes, whilst you could engineer it with Lambda/Step functions you're probably better off getting a EC2 micro instance.

2)Nope you've pretty much got it.

3) As above you want to go with EC2 for this, there's a good article on using Data Pipelines to start / stop an EC2 instance here that way by starting instance only when you need it the cost(if any) is negligible.

I have jobs that run in this fashion normally you can get away with with a t2.micro instance which is free tier eligible.

You can also run your perl scripts on an EC2 instance so no need to rewrite them!

What you want are called Activity Workers. Tl;dr: You register "activities" and each gets an ARN. Then you can put that ARN in the resource field of Task states and then you run some code (the "worker") somewhere (in a Lambda, on EC2, in your basement, wherever) that polls for tasks identified by that ARN, then calls back to report success or failure. Activity Workers can run for up to a year.

Step-by-step details at the AWS docs

In response to RTF's comment, here's a deeper dive: Suppose you have code to color turtles in color_turtles.pl. So what you do is call the CreateActivity API - see http://docs.aws.amazon.com/step-functions/latest/apireference/API_CreateActivity.html - giving the name "ColorTurtles" and it'll give you back an ARN, a string beginning arn:aws... Then in your state machine you make a Task state with that ARN as the value of the resource field. Then you add code to color_turtles.pl to poll the service with http://docs.aws.amazon.com/step-functions/latest/apireference/API_GetActivityTask.html - whenever a machine you're running gets to that task, it'll go look for activity workers polling. It'll give your polling worker the input for the task, then you process the input and generate some output, and call SendTaskSuccess or SendTaskFailure. All these are just REST HTTP calls, so you can run them anywhere and I mean anywhere; in a Lambda, on an EC2 instance, or on some computer anywhere on the Internet.

I will start with that it seems you are looking for workflow solutions on AWS. SWF and Step functions are the two most popular ones. Steps function is more recent offering and encouraged by AWS more than SWF.

SWF has native capability to handle long-running tasks, the downside is that you have to provide your own execution environment for deciders (can't use lambda).

With step functions, you can do this in two different ways. One of the approaches is suggested by Tim in his answer. There is an alternative way to achieve the same which is using job poller in step functions . Job pollers have the ability to call (poll) your resource and find out if the task is done and if not you can send execution in wait mode for the specified time. As mentioned above maximum execution time allowed currently for any workflow is 1 year. In case you have tasks which may take longer than 1 year, you can't use step functions in its current form.

Long-running AWS Lambda tasks with progress and cancellation

Running tasks in AWS Lambda background

Invoke AWS Lambda via AWS Step Functions?

AWS lambda Step Functions Activities and RDS

Long running scheduled tasks in .net core and aws

Running a graphql app on AWS lambda

AWS lambda for running long scripts and use of databse

Long running jobs with AWS Gateway - Lambda - RDS

AWS Lambda long running http requests

aws step functions pass data from lambda to lambda

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Long-running AWS Lambda tasks with progress and cancellation Running tasks in AWS Lambda background Invoke AWS Lambda via AWS Step Functions? AWS lambda Step Functions Activities and RDS Long running scheduled tasks in .net core and aws Running a graphql app on AWS lambda AWS lambda for running long scripts and use of databse Long running jobs with AWS Gateway - Lambda - RDS AWS Lambda long running http requests aws step functions pass data from lambda to lambda

Related Tags

Suitability of app with long running tasks for AWS Lambda or AWS Step Functions

Question

3 answers

solution1
1 2017-06-05 12:38:40

solution2
1 ACCPTED 2017-06-06 08:13:41

solution3
0 2018-10-26 20:57:36

Suitability of app with long running tasks for AWS Lambda or AWS Step Functions

Question

3 answers

solution1 1 2017-06-05 12:38:40

solution2 1 ACCPTED 2017-06-06 08:13:41

solution3 0 2018-10-26 20:57:36

solution1
1 2017-06-05 12:38:40

solution2
1 ACCPTED 2017-06-06 08:13:41

solution3
0 2018-10-26 20:57:36