簡體   English   中英

步驟中 EMR 集群創建失敗

[英]EMR Cluster creation fails on the step

我第一次嘗試使用 Lambda function 創建 EMR 集群失敗,並出現以下錯誤。 我打算使用 script-runner.jar 來啟動位於 S3 存儲桶中的 python 腳本。 有人可以幫我理解這個錯誤嗎? 我到底錯過了什么?

2019-11-21T20:34:59.990Z INFO Ensure step 1 jar file s3a://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
INFO Failed to download: s3a://<region>.elasticmapreduce/libs/script-runner/script-runner.jar
java.io.IOException: Unable to download 's3a://<region>.elasticmapreduce/libs/script-runner/script-runner.jar'. Only s3 + local files are supported
    at aws157.instancecontroller.util.S3Wrapper.fetchHadoopFileToLocal(S3Wrapper.java:353)
    at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner$Runner.<init>(HadoopJarStepRunner.java:243)
    at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner.createRunner(HadoopJarStepRunner.java:152)
    at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner.createRunner(HadoopJarStepRunner.java:146)
    at aws157.instancecontroller.master.steprunner.StepExecutor.runStep(StepExecutor.java:136)
    at aws157.instancecontroller.master.steprunner.StepExecutor.run(StepExecutor.java:70)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager.enqueueStep(StepExecutionManager.java:246)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager.doRun(StepExecutionManager.java:193)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager.access$000(StepExecutionManager.java:33)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager$1.run(StepExecutionManager.java:94)

我寫得很松散的 lambda function 如下:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import boto3
import datetime


def lambda_handler(event, context):
    print ('Creating EMR')
    connection = boto3.client('emr', region_name='us-east-1')
    print (event)

    cluster_id = connection.run_job_flow(
        Name='MyTest',
        VisibleToAllUsers=True,
        JobFlowRole='EMR_EC2_DefaultRole',
        ServiceRole='EMR_DefaultRole',
        LogUri='s3://bucket-emr/logs',
        ReleaseLabel='emr-5.21.0',
        Applications=[{'Name': 'Hadoop'}, {'Name': 'Spark'}],
        Instances={
            'InstanceGroups': [{
                'Name': 'Master nodes',
                'Market': 'ON_DEMAND',
                'InstanceRole': 'MASTER',
                'InstanceType': 'm3.xlarge',
                'InstanceCount': 1,
                }, {
                'Name': 'Slave nodes',
                'Market': 'SPOT',
                'InstanceRole': 'CORE',
                'InstanceType': 'm3.xlarge',
                'InstanceCount': 2,
                }],
            'KeepJobFlowAliveWhenNoSteps': True,
            'Ec2KeyName': 'keys-kvp',
            'Ec2SubnetId': 'subnet-dsb65490',
            'EmrManagedMasterSecurityGroup': 'sg-0daa54d041d1033',
            'EmrManagedSlaveSecurityGroup': 'sg-0daa54d041d1033',
            },
            Configurations=[{
            "Classification":"spark-env",
            "Properties":{},
            "Configurations":[{
                "Classification":"export",
                "Properties":{
                    "PYSPARK_PYTHON":"python36",
                    "PYSPARK_DRIVER_PYTHON":"python36"
                }
            }]
            }],
            Steps=[{
            'Name': 'mystep',
            'ActionOnFailure': 'TERMINATE_CLUSTER',
            'HadoopJarStep': {
                'Jar': 's3a://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar',
                'Args': [
                    '/home/hadoop/spark/bin/spark-submit', '--deploy-mode', 'cluster', '--master', 'yarn', 's3a://inscape-script/wordcount.py',
                ]
            }
        }]
        )

    return 'Started cluster {}'.format(cluster_id)

創建集群時我缺少什么? 提前致謝。

您可以嘗試將您的“Jar”參數更改為此嗎?

'Jar': 's3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar',

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-script.html

您也可以通過將“Jar”參數更改為來嘗試使用命令運行程序

/var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM