![](/img/trans.png)
[英]Creating AWS EMR cluster with spark step using lambda function fails with “Local file does not exist”
[英]AWS step function does not add next step to EMR cluster when current step fails
我已經從 AWS 步驟 function 設置了一台 state 機器,它將創建一個 EMR 集群,添加一些 emr 步驟,然后終止集群。 只要所有步驟都運行完成且沒有任何錯誤,這就可以正常工作。 如果一個步驟失敗,盡管添加了一個 catch 以繼續下一步,但這不會發生。 每當一個步驟失敗時,該步驟被標記為已捕獲(在圖中以橙色表示),但下一步被標記為已取消。
如果有幫助,這是我的步驟 function 定義:
{
"StartAt": "MyEMR-SMFlowContainer-beta",
"States": {
"MyEMR-SMFlowContainer-beta": {
"Type": "Parallel",
"End": true,
"Branches": [
{
"StartAt": "CreateClusterStep-feature-generation-cluster-beta",
"States": {
"CreateClusterStep-feature-generation-cluster-beta": {
"Next": "Step-SuccessfulJobOne",
"Type": "Task",
"ResultPath": "$.Cluster.1.CreateClusterTask",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Instances": {
"Ec2SubnetIds": [
"subnet-*******345fd38423"
],
"InstanceCount": 2,
"KeepJobFlowAliveWhenNoSteps": true,
"MasterInstanceType": "m4.xlarge",
"SlaveInstanceType": "m4.xlarge"
},
"JobFlowRole": "MyEMR-emrInstance-beta-EMRInstanceRole",
"Name": "emr-step-fail-handle-test-cluster",
"ServiceRole": "MyEMR-emr-beta-EMRRole",
"Applications": [
{
"Name": "Spark"
},
{
"Name": "Hadoop"
}
],
"AutoScalingRole": "MyEMR-beta-FeatureG-CreateClusterStepfeature-NJB2UG1J1EWB",
"Configurations": [
{
"Classification": "spark-env",
"Configurations": [
{
"Classification": "export",
"Properties": {
"PYSPARK_PYTHON": "/usr/bin/python3"
}
}
]
}
],
"LogUri": "s3://MyEMR-beta-feature-createclusterstepfeature-1jpp1wp3dfn04/emr/logs/",
"ReleaseLabel": "emr-5.32.0",
"VisibleToAllUsers": true
}
},
"Step-SuccessfulJobOne": {
"Next": "Step-AlwaysFailingJob",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Step-AlwaysFailingJob"
}
],
"Type": "Task",
"TimeoutSeconds": 7200,
"ResultPath": "$.ClusterStep.SuccessfulJobOne.AddSparkTask",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.Cluster.1.CreateClusterTask.ClusterId",
"Step": {
"Name": "SuccessfulJobOne",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"client",
"--master",
"yarn",
"--conf",
"spark.logConf=true",
"--class",
"com.test.sample.core.EMRJobRunner",
"s3://my-****-bucket/jars/77/my-****-bucketBundleJar-1.0.jar",
"--JOB_NUMBER",
"1",
"--JOB_KEY",
"SuccessfulJobOne"
]
}
}
}
},
"Step-AlwaysFailingJob": {
"Next": "Step-SuccessfulJobTwo",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Step-SuccessfulJobTwo"
}
],
"Type": "Task",
"TimeoutSeconds": 7200,
"ResultPath": "$.ClusterStep.AlwaysFailingJob.AddSparkTask",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.Cluster.1.CreateClusterTask.ClusterId",
"Step": {
"Name": "AlwaysFailingJob",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"client",
"--master",
"yarn",
"--conf",
"spark.logConf=true",
"--class",
"com.test.sample.core.EMRJobRunner",
"s3://my-****-bucket/jars/77/my-****-bucketBundleJar-1.0.jar",
"--JOB_NUMBER",
"2",
"--JOB_KEY",
"AlwaysFailingJob"
]
}
}
}
},
"Step-SuccessfulJobTwo": {
"Next": "TerminateClusterStep-feature-generation-cluster-beta",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "TerminateClusterStep-feature-generation-cluster-beta"
}
],
"Type": "Task",
"TimeoutSeconds": 7200,
"ResultPath": "$.ClusterStep.SuccessfulJobTwo.AddSparkTask",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.Cluster.1.CreateClusterTask.ClusterId",
"Step": {
"Name": "DeviceJob",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"client",
"--master",
"yarn",
"--conf",
"spark.logConf=true",
"--class",
"com.test.sample.core.EMRJobRunner",
"s3://my-****-bucket/jars/77/my-****-bucketBundleJar-1.0.jar",
"--JOB_NUMBER",
"3",
"--JOB_KEY",
"SuccessfulJobTwo"
]
}
}
}
},
"TerminateClusterStep-feature-generation-cluster-beta": {
"End": true,
"Type": "Task",
"ResultPath": null,
"Resource": "arn:aws:states:::elasticmapreduce:terminateCluster.sync",
"Parameters": {
"ClusterId.$": "$.Cluster.1.CreateClusterTask.ClusterId"
}
}
}
}
]
}
},
"TimeoutSeconds": 43200
}
有人可以建議我如何在步驟中發現失敗並忽略它添加下一步。 提前致謝。
問題是因為我沒有在 catch 屬性中指定 resultPath。 這導致 resultPath 被 catch 塊覆蓋,因為 resultPath 的默認值為 $。 下一步無法獲取集群信息,因為該信息已被覆蓋並因此被取消。
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Step-SuccessfulJobTwo"
}
],
一旦我更新了 catch 以獲得正確的結果路徑,它就會按預期工作。
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Step-SuccessfulJobTwo",
"ResultPath": "$.ClusterStep.SuccessfulJobOne.AddSparkTask.Error",
}
],
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.