簡體   English   中英

AWS Step Functions EMR PySpark 任務步驟失敗

[英]AWS Step Functions EMR PySpark Task Step failed

我有一個有效的 EMR 步驟,大約需要 500 秒。 工作 EMR controller 日志顯示:

2021-09-19T18:36:59.786Z INFO StepRunner: Created Runner for step 7
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster s3://emr/scripts/aggregate.py --day 20210915 --base_uri s3://'
...
2021-09-19T18:46:35.934Z INFO Step succeeded with exitCode 0 and took 576 seconds

當我嘗試使用 Step Functions 運行相同的步驟時,spark-submit 代碼看起來相同,但出現錯誤:

2021-09-19T18:36:59.786Z INFO StepRunner: Created Runner for step 7
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster s3://emr/scripts/aggregate.py --day 20210915 --base_uri s3://'
...
2021-09-22T09:56:07.309Z WARN Step failed with exitCode 1 and took 2 seconds

STDERR 顯示:線程“main”中的異常 java.lang.RuntimeException:java.io.IOException:無法運行程序“spark-submit”(在目錄“.”中):error=2,沒有這樣的文件或目錄

使用步進函數運行 pyspark 腳本的正確方法是什么?

步驟定義:

"EMR AddStep - Aggregate Daily": {
  "Type": "Task",
  "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
  "Parameters": {
    "ClusterId.$": "$.cluster.Cluster.Id",
    "Step": {
      "Name": "Aggregate Daily",
      "ActionOnFailure": "CONTINUE",
      "HadoopJarStep": {
        "Jar": "command-runner.jar",
        "Args": [
          "spark-submit",
          "--deploy-mode",
          "cluster",
          "s3://emr/scripts/aggregate.py",
          "--day",
          "20210915",
          "--base_uri",
          "s3://"
        ]
      }
    }
  }

問題是 EMR 集群是在沒有 Spark 選項的情況下創建的。

解決方案是在步驟函數創建集群任務的“應用程序”屬性下添加"Name": "Spark"

    "EMR CreateCluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
      "Parameters": {
        "Name": "ExampleCluster",
        "VisibleToAllUsers": true,
        "ReleaseLabel": "emr-5.33.0",
        "Applications": [
          {
            "Name": "Hive",
            "Name": "Spark"
          }
        ],

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM