简体   繁体   中英

AWS EMR Spark is not loading MainClass using custom Jar

I'm trying to create an emr spark cluster with a single custom step. The cluster is created successfully however, the step is not correctly defined.

UPDATE

I tried to lunch the same cluster via the web console and get the same results. While I specify the Jar location when I save the step the JAR location is set to command-runner.jar and the provided JAR path is added to the Arguments list.

CLI Command:

aws emr create-cluster --name 'emr-test' \
--applications Name=Spark \
--release-label emr-5.11.1 \
--auto-terminate \
--instance-type m3.xlarge \
--instance-count 1 \
--ec2-attributes SubnetId=subnet-000000 \
--steps '[{
    "Type": "SPARK",
    "Name": "spark-program",
    "Args": ["--class","--init-keyspaces"],
    "Jar": "s3://mybucket/snapshots/0.1.0-SNAPSHOT/2.11/my-spark-assembly-0.1.0-SNAPSHOT.jar",
    "ActionOnFailure": "TERMINATE_CLUSTER",
    "MainClass":"com.myspark.data.consumers.jobs.MyJob"
}]' \
--use-default-roles \
--log-uri 's3://mybucket/logs' \
--tags Name='spark-program' Environment='test'

Result:

When I check under the Step tab in the console.

JAR location: command-runner.jar
Main class: None
Arguments: spark-submit --class --init-keyspaces
Action on failure: Terminate cluster

Expected:

JAR location: s3://mybucket/snapshots/0.1.0-SNAPSHOT/2.11/my-spark-assembly-0.1.0-SNAPSHOT.jar
Main class: com.myspark.data.customer.jobs.MyJob
Arguments: spark-submit --class --init-keyspaces
Action on failure: Terminate cluster

I've confirmed the S3 bucket and JAR are in the correct location. I'm getting the same result when using standard syntax as well.

Found that my expectation was incorrect. When creating a new job via the CLI and including only JAR args then a Custom JAR project is created. If spark args (ie --conf ) are also passed in to the CLI then a Spark job is created.

These two job types from the web console look different. For example, the JAR location is set to command-runner.jar for Spark jobs however for a Custom JAR it is set to the path of the s3 bucket.

AWS Custom Spark Step Documentation https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM