[英]How to run a Spark job on EMR via Cloudformation
I am just getting started with AWS and have been playing around with EMR and CloudFormation.我刚刚开始使用 AWS,并且一直在使用 EMR 和 CloudFormation。 My goal is to write a Cloudformation template that will:我的目标是编写一个 Cloudformation 模板,它将:
1. Create an EMR cluster with Spark and Hadoop installed
2. Run Spark jobs on the EMR cluster. Jobs will be submitted as a JAR or Pyspark files.
I have been able to successfully complete Step 1 but I am not sure how Step 2 is supposed to be done via CloudFormation.我已经能够成功完成第 1 步,但我不确定第 2 步应该如何通过 CloudFormation 完成。
I have been trying to look at a couple of examples on the AWS documentation and other sites but I could not see one where a spark job was being deployed via CloudFormation template.我一直在尝试查看 AWS 文档和其他站点上的几个示例,但我看不到通过 CloudFormation 模板部署 spark 作业的示例。
Any examples or pointers in the right direction would be very helpful.任何正确方向的示例或指示都会非常有帮助。 Thanks in advance!提前致谢!
Change your EMR Cloudformation script like that parameters section of EMR像 EMR 的参数部分一样更改您的 EMR Cloudformation 脚本
StepScriptFilePath:
Type: String
Description: Step Scipt to run a bash script or add a java file here
Default: 's3://s3-bucket/steps/step1.sh'
StepScriptFilePython:
Type: String
Description: Step Scipt to run a python file file
Default: 's3://s3-bucket/steps/step2.py'
StepJar:
Type: String
Description: Spark jar file
Default: 's3://elasticmapreduce/libs/script-runner/script-runner.jar'
add this under EMR properties在 EMR 属性下添加此项
Steps:
- ActionOnFailure: CONTINUE
HadoopJarStep:
Args:
- Ref: StepScriptFile
Jar:
Ref: StepJar
MainClass: ''
Name: run any bash or java job in spark
- ActionOnFailure: CONTINUE
HadoopJarStep:
Args:
- "spark-submit"
- Ref: StepScriptFilePython
Jar: command-runner.jar
Name: run a python script job
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.