I am just getting started with AWS and have been playing around with EMR and CloudFormation. My goal is to write a Cloudformation template that will:
1. Create an EMR cluster with Spark and Hadoop installed
2. Run Spark jobs on the EMR cluster. Jobs will be submitted as a JAR or Pyspark files.
I have been able to successfully complete Step 1 but I am not sure how Step 2 is supposed to be done via CloudFormation.
I have been trying to look at a couple of examples on the AWS documentation and other sites but I could not see one where a spark job was being deployed via CloudFormation template.
Any examples or pointers in the right direction would be very helpful. Thanks in advance!
Change your EMR Cloudformation script like that parameters section of EMR
StepScriptFilePath:
Type: String
Description: Step Scipt to run a bash script or add a java file here
Default: 's3://s3-bucket/steps/step1.sh'
StepScriptFilePython:
Type: String
Description: Step Scipt to run a python file file
Default: 's3://s3-bucket/steps/step2.py'
StepJar:
Type: String
Description: Spark jar file
Default: 's3://elasticmapreduce/libs/script-runner/script-runner.jar'
add this under EMR properties
Steps:
- ActionOnFailure: CONTINUE
HadoopJarStep:
Args:
- Ref: StepScriptFile
Jar:
Ref: StepJar
MainClass: ''
Name: run any bash or java job in spark
- ActionOnFailure: CONTINUE
HadoopJarStep:
Args:
- "spark-submit"
- Ref: StepScriptFilePython
Jar: command-runner.jar
Name: run a python script job
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.