简体繁体中英

How to run PySpark on AWS EMR with AWS Lambda

原文 2020-06-02 19:22:27 6 2 amazon-web-services/ pyspark/ aws-lambda/ amazon-emr

How may I make my PySpark code to run with AWS EMR from AWS Lambda? Do I have to use AWS Lambda to create an auto-terminating EMR cluster to run my S3-stored code once?

2 answers

You need transient cluster for this case which will auto terminate once your job is completed or the timeout is reached whichever occurs first.

You can access this link on how to initialise the same.

What are the processes available to create a EMR cluster:

Using boto3 / AWS CLI / Java SDK

Using cloudformation

Using Data Pipeline

Do I have to use AWS Lambda to create an auto-terminating EMR cluster to run my S3-stored code once?

No . It isn't mandatory to use lambda to create an auto-terminating cluster.

You just need to specify a flag --auto-terminate while creating a cluster using boto3 / CLi / Java-SDK. But this case you need to submit the job along with cluster config. Ref

Note:

Its not possible to create an auto-terminating cluster using cloudformation. By design, CloudFormation assumes that the resources that are being created will be permanent to some extent.

If you REALLY had to do it this way, you could make an AWS api call to delete the CF stack upon finishing your EMR tasks.

How may I make my PySpark code to run with AWS EMR from AWS Lambda?

You can design your lambda to submit spark job . You can find an example here

In my use case I have one parameterised lambda which invoke CF to create cluster, submit job and terminate cluster.

Emr Launch with AWS Lambda

AWS EMR, python pyspark script in EMR steps

AWS EMR Spark Glue PySpark -

How can I add 2 (pyspark,scala) steps together in AWS EMR?

How to run an executable in aws lambda?

How to schedule aws lambda to run

how to configure aws lambda to be able to access services on the emr master node?

How AWS EMR resize

Module error caused from AWS EMR by running PySpark code in Apache Livy via lambda function

Run Pig with Lipstick on AWS EMR

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Emr Launch with AWS Lambda AWS EMR, python pyspark script in EMR steps AWS EMR Spark Glue PySpark - How can I add 2 (pyspark,scala) steps together in AWS EMR? How to run an executable in aws lambda? How to schedule aws lambda to run how to configure aws lambda to be able to access services on the emr master node? How AWS EMR resize Module error caused from AWS EMR by running PySpark code in Apache Livy via lambda function Run Pig with Lipstick on AWS EMR

Related Tags

How to run PySpark on AWS EMR with AWS Lambda

Question

2 answers

solution1
1 2020-06-03 13:03:12

solution2
0 2020-06-03 08:38:14

How to run PySpark on AWS EMR with AWS Lambda

Question

2 answers

solution1 1 2020-06-03 13:03:12

solution2 0 2020-06-03 08:38:14

solution1
1 2020-06-03 13:03:12

solution2
0 2020-06-03 08:38:14