简体   繁体   中英

How to run a Spark job on EMR via Cloudformation

I am just getting started with AWS and have been playing around with EMR and CloudFormation. My goal is to write a Cloudformation template that will:

1. Create an EMR cluster with Spark and Hadoop installed
2. Run Spark jobs on the EMR cluster. Jobs will be submitted as a JAR or Pyspark files.

I have been able to successfully complete Step 1 but I am not sure how Step 2 is supposed to be done via CloudFormation.

I have been trying to look at a couple of examples on the AWS documentation and other sites but I could not see one where a spark job was being deployed via CloudFormation template.

Any examples or pointers in the right direction would be very helpful. Thanks in advance!

Change your EMR Cloudformation script like that parameters section of EMR

StepScriptFilePath:
  Type: String
  Description: Step Scipt to run a bash script or add a java file here
  Default: 's3://s3-bucket/steps/step1.sh'
StepScriptFilePython:
  Type: String
  Description: Step Scipt to run a python file file
  Default: 's3://s3-bucket/steps/step2.py'
StepJar:
  Type: String
  Description: Spark jar file
  Default: 's3://elasticmapreduce/libs/script-runner/script-runner.jar'

add this under EMR properties

  Steps:
    - ActionOnFailure: CONTINUE
      HadoopJarStep:
        Args:
          - Ref: StepScriptFile
        Jar:
          Ref: StepJar
        MainClass: ''
      Name: run any bash or java job in spark
   - ActionOnFailure: CONTINUE
      HadoopJarStep:
        Args:
          - "spark-submit"
          - Ref: StepScriptFilePython
        Jar: command-runner.jar
      Name: run a python script job

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM