AWS Step Functions：如何访问在 catch 块中生成异常的 state 的输入？

Question

I am trying to add error handling in my step function flow using the Parallel and Catch blocks as defined in the State Machine Language.我正在尝试使用 State 机器语言中定义的 Parallel 和 Catch 块在我的步骤 function 流程中添加错误处理。

Following is the flow diagram of my step functions:以下是我的步骤功能的流程图：

Since I want a common error handler for all the step functions, I have wrapped them in a Parallel block and added a common Catch block to catch any errors in any of the step functions.因为我想要一个用于所有步进函数的通用错误处理程序，所以我将它们包装在一个 Parallel 块中，并添加了一个通用 Catch 块来捕获任何步进函数中的任何错误。 On looking through various examples and blogs, I followed this link and implemented a similar approach.在浏览各种示例和博客时，我遵循了这个链接并实现了类似的方法。

What I observe is that, whenever any state raises an exception, the control goes into the catch block.我观察到的是，每当任何 state 引发异常时，控件都会进入 catch 块。 The input to the catch block is the exception that was raised containing an Error and Cause in a JSON object. catch 块的输入是在 JSON object 中引发的包含错误和原因的异常。 Since I wanted the error along with the input that was passed to that state, I added the ResultPath as "$.error" in the catch block.由于我想要错误以及传递给 state 的输入，因此我在 catch 块中将 ResultPath添加为“$.error” 。 Following is the JSON spec that defines the state machine.以下是定义 state 机器的 JSON 规范。

    {
  "StartAt": "Try",
  "States": {
    "Try": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "Step-1",
          "States": {
            "Step-1": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:eu-west-1:1234:function:step-1-lambda",
              "Next": "Step-2"
            },
            "Step-2": {
              "Type": "Choice",
              "Choices": [
                {
                  "Variable": "$.some_variable",
                  "StringEquals": "some_string",
                  "Next": "Step-3"
                },
                {
                  "Variable": "$.some_variable",
                  "StringEquals": "some_other_string",
                  "Next": "Step-4"
                }
              ],
              "Default": "Step-6"
            },
            "Step-3": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:eu-west-1:1234:function:step-3-lambda",
              "Next": "Step-6"
            },
            "Step-4": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:eu-west-1:1234:function:step-4-lambda",
              "Next": "Step-6"
            },
            "Step-6": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:eu-west-1:1234:function:step-6-lambda",
              "End": true
            }
          }
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "ResultPath": "$.error",
          "Next": "ErrorHandler"
        }
      ],
      "Next": "UnwrapOutput"
    },
    "UnwrapOutput": {
      "Type": "Pass",
      "InputPath": "$[0]",
      "End": true
    },
    "ErrorHandler": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:eu-west-1:1234:function:step-7-lambda",
      "End": true
    }
  }
}

For example, consider that Step-4 generates an exception.例如，考虑第 4 步生成异常。 The input to this state is:这个 state 的输入是：

{
   "foo": "abc",
   "bar": "def"
}

The input with which the state machine is triggered is:触发 state 机器的输入是：

{
  "boo": "jkl",
   "baz": "mno"
}

In the ErrorHandler, as Step-4 generates an exception I was expecting that the input to the ErrorHandler state would be:在 ErrorHandler 中，当第 4 步生成异常时，我期望 ErrorHandler state 的输入为：

{
  "foo": "abc",
   "bar": "def",
   "error": {
       "Error": "SomeError",
       "Cause": "SomeCause"
   }
}

However, the input received consists of the original input that is used to trigger the flow.但是，接收到的输入包含用于触发流的原始输入。

{
  "boo": "jkl",
   "baz": "mno",
   "error": {
       "Error": "SomeError",
       "Cause": "SomeCause"
   }
}

I need to access the input fields of the state that caused the exception in the ErrorHandler.我需要访问导致 ErrorHandler 异常的 state 的输入字段。 Using "$" it provides the input that was used to trigger the flow.使用“$”它提供用于触发流程的输入。 Is there a way I can achieve this?有没有办法可以做到这一点？

Any help would be appreciated, I am trying to figure this out since a long time.任何帮助将不胜感激，我一直在努力解决这个问题。

Answer 1

I'm only 10 months late, not that much haha but I hope you have already found a solution for this, In any case, I will share my two cents so I can help another dev, or even better, someone can show me a better way to do this!我只晚了 10 个月，没那么多哈哈，但我希望你已经找到了解决方案，无论如何，我会分享我的两分钱，以便我可以帮助另一个开发人员，或者更好，有人可以告诉我一个更好的方法来做到这一点！

First, let's see what scenarios we have:首先，让我们看看我们有哪些场景：

Sych jobs execution同步作业执行
Asynch jobs execution异步作业执行

Our goal: To access the job that triggered the error somehow我们的目标：以某种方式访问触发错误的作业

First solution - Apply for all scenarios:第一个解决方案 - 适用于所有场景：

Basically, add custom try catch's to all your jobs assets, in other words, your lambda functions should throw an error that provides info about the job which it's using it.基本上，将自定义 try catch 添加到您的所有作业资产中，换句话说，您的 lambda 函数应该抛出一个错误，提供有关它正在使用的作业的信息。 I don't like that approach that much because you are changing your isolated functions in order to achieve some logic in your state machine.我不太喜欢这种方法，因为您正在更改隔离功能以在 state 机器中实现某些逻辑。 In the end, you are coupling two separated concepts, your state machine shouldn't need external tools to operate and log about its own context.最后，您将两个独立的概念结合在一起，您的 state 机器不应该需要外部工具来操作和记录自己的上下文。 I could be wrong here, but that's only my two cents, feel free to offend my family (just kidding, but correct me as you wish).我在这里可能是错的，但这只是我的两分钱，请随意冒犯我的家人（开个玩笑，但如你所愿纠正我）。

Second solution - Apply to Sych jobs execution第二种解决方案 - 应用于 Sych 作业执行

When you add an "addCatch" in your state machine, the default behavior it's the error output to overwrite the step input.当您在 state 机器中添加“addCatch”时，默认行为是错误 output 以覆盖步进输入。 To solve this you only need to change the addCatch resultPath, this way you will store the error output alongside the step input.要解决此问题，您只需更改 addCatch resultPath，这样您将在 step 输入旁边存储错误 output。
EX: "Catch": [ { "ErrorEquals": [ "States.All" ], "Next": "ErrorHandler" "ResultPath": "$.error-info" } ] EX：“Catch”：[{“ErrorEquals”：[“States.All”]，“Next”：“ErrorHandler”“ResultPath”：“$.error-info”}]

But Why this is important??????但是为什么这很重要？？？？？？

This way you will be able to access the step input in the errorHandlerJob, which means that you can always pass the stepName into the next step input, this way you would always know which job failed.这样，您将能够访问 errorHandlerJob 中的步骤输入，这意味着您始终可以将 stepName 传递给下一步输入，这样您就可以始终知道哪个作业失败了。 And you won't do this by changing your lambda function directly, but by using the job's properties, solving the coupling issue.而且您不会通过直接更改 lambda function 来做到这一点，而是通过使用作业的属性来解决耦合问题。 But this won't work in the ASYNC scenario and I'll explain next.但这在 ASYNC 场景中不起作用，我将在接下来进行解释。

Third Solution -- Apply to Asynch jobs execution第三种解决方案——应用于异步作业执行

The previous solution won't work here because in this case, you can only access the original input since you are using parallel branches.以前的解决方案在这里不起作用，因为在这种情况下，您只能访问原始输入，因为您使用的是并行分支。 So what I did here was similar to the last case.所以我在这里所做的与上一个案例类似。 I added Pass states in parallel branches and these Pass states are responsible for invoking my jobs synchronously, also all of my jobs have their own errorHandlingJob NOT DIFFERENT LAMBDA FUNCTIONS THO.我在并行分支中添加了 Pass 状态，这些 Pass 状态负责同步调用我的作业，而且我的所有作业都有自己的 errorHandlingJob NOT DIFFERENT LAMBDA FUNCTIONS THO。 I'm not creating new resources on AWS, there's only one HandleError Lambda function, so I can focus my monitoring on that specific function.我没有在 AWS 上创建新资源，只有一个 HandleError Lambda function，因此我可以将监控重点放在特定的 function 上。 But, I use it to create one errorHandlingJob for each job my state machine has to execute.但是，我使用它为我的 state 机器必须执行的每个作业创建一个 errorHandlingJob。
The downside it's the huge graph your state machine has now, but the good part is that you are now able to log which job failed.缺点是您的 state 机器现在拥有的巨大图表，但好的部分是您现在可以记录哪个作业失败了。

Without any abstraction it would be something like this "USING CDK"没有任何抽象，它会是这样的“使用 CDK”

    const job1 = new tasks.LambdaInvoke(scope, 'First Job -- PASS', {
        lambdaFunction: function1,
        outputPath: '$.Payload'
    })

    const job2 = new tasks.LambdaInvoke(scope, 'Second Job -- PASS', {
        lambdaFunction: function2,
        outputPath: '$.Payload'
    })

    const job3 = new tasks.LambdaInvoke(scope, 'Third Job -- PASS', {
        lambdaFunction: function3,
        outputPath: '$.Payload'
    })

    const generateHandleErrorJob = () => new tasks.LambdaInvoke(scope, `Handle Error Job ${Math.random() * 160000000}`, {
        lambdaFunction: functionError,
        outputPath: '$.Payload'
    })

    const jobToThrowError = new tasks.LambdaInvoke(scope, 'Job To Throw Error -- PASS', {
        lambdaFunction: fucntionThrowError,
        outputPath: '$.Payload',
    })

    const generatePassCheckSetep = (stepName: string) => new sfn.Pass(scope, `Pass: ${stepName}`, {
        resultPath: '$.step-info',
        result: sfn.Result.fromObject({
            step: stepName
        })
    })

    const definition = new sfn.Parallel(scope, 'Parallel Execution -- PASS')
        .branch(generatePassCheckSetep('job1').next(job1.addCatch(generateHandleErrorJob(), {resultPath: '$.error-info'})))
        .branch(generatePassCheckSetep('jobToThrowError').next(jobToThrowError.addCatch(generateHandleErrorJob(), {resultPath: '$.error-info'})))
        .branch(generatePassCheckSetep('job2').next(job2.addCatch(generateHandleErrorJob(), {resultPath: '$.error-info'})))
        .next(job3)

    new sfn.StateMachine(scope, id, {
        definition,
        timeout: cdk.Duration.minutes(3)
    })

But I also created an abstraction "ParallelStateMachineCatch" so you can use just like this:但我还创建了一个抽象“ParallelStateMachineCatch”，因此您可以像这样使用：

this.definition = new ParallelStateMachineCatch(this, 
}, handleErrorFunction)
  .branchCatch(job1)
  .branchCatch(job2)
  .branchCatch(job3)
  .branchCatch(job4)
  .branchCatch(job5)
  .branchCatch(job6)
  .next(final)

} }

Here's the ParallelStateMachineCatch code:这是 ParallelStateMachineCatch 代码：

import { Construct, Duration } from 'monocdk'
import { NodejsFunction } from 'monocdk/aws-lambda-nodejs'
import { Pass,Result, Parallel, ParallelProps } from 'monocdk/aws-stepfunctions'
import { LambdaInvoke } from 'monocdk/aws-stepfunctions-tasks'

export interface DefinitionProps {
  sonosEnvironment: string
  region: string
  accountNumber: string
}

export class ParallelStateMachineCatch extends Parallel {
  private errorHandler: NodejsFunction

  constructor(scope: Construct, id: string, props: ParallelProps, errorHandler: NodejsFunction) {
    super(scope, id, props)
    this.errorHandler = errorHandler
  }



  branchCatch(task: LambdaInvoke): ParallelStateMachineCatch {
    const randomId = Math.random().toString().replace('0.', '')
    const passInputJob = ParallelStateMachineCatch.generatePassInput(this, task.id, randomId)
    const handleErrorJob = ParallelStateMachineCatch.generateHandleErrorJob(this, this.errorHandler, randomId)
    const resultPath = '$.error-info'

    this.branch(passInputJob.next(task.addCatch(handleErrorJob, { resultPath })))

    return this
  }

  private static generateHandleErrorJob(scope: Construct, errorHandler: NodejsFunction, randomId: string): LambdaInvoke {
    return new LambdaInvoke(scope, `Handle Error ${ randomId }`, {
      lambdaFunction: errorHandler,
      outputPath: '$.Payload',
      timeout: Duration.seconds(5),
    })
  }

  private static generatePassInput(scope: Construct, stepName: string, randomId: string): Pass {
    return new Pass(scope, `Pass Input ${ randomId }`, {
      resultPath: '$.step-info',
      result: Result.fromObject({
        name: stepName
      })
    })
  }

}

Anyway, I hope I can help someone with this, that's how I managed to solve this issue.无论如何，我希望我能帮助别人，这就是我设法解决这个问题的方法。 Please, feel free to teach me better ways!请随时教我更好的方法！ Tks Good Luck and Good Code Tks 好运和好代码

AWS Step Functions：如何访问在 catch 块中生成异常的 state 的输入？

问题描述

1 个解决方案

解决方案1
0 2021-06-07 14:44:25

Our goal: To access the job that triggered the error somehow我们的目标：以某种方式访问触发错误的作业

First solution - Apply for all scenarios:第一个解决方案 - 适用于所有场景：

Second solution - Apply to Sych jobs execution第二种解决方案 - 应用于 Sych 作业执行

But Why this is important??????但是为什么这很重要？？？？？？

Third Solution -- Apply to Asynch jobs execution第三种解决方案——应用于异步作业执行

Without any abstraction it would be something like this "USING CDK"没有任何抽象，它会是这样的“使用 CDK”

But I also created an abstraction "ParallelStateMachineCatch" so you can use just like this:但我还创建了一个抽象“ParallelStateMachineCatch”，因此您可以像这样使用：

AWS Step Functions：如何访问在 catch 块中生成异常的 state 的输入？

问题描述

1 个解决方案

解决方案1 0 2021-06-07 14:44:25

Our goal: To access the job that triggered the error somehow我们的目标：以某种方式访问触发错误的作业

First solution - Apply for all scenarios:第一个解决方案 - 适用于所有场景：

Second solution - Apply to Sych jobs execution第二种解决方案 - 应用于 Sych 作业执行

But Why this is important??????但是为什么这很重要？？？？？？

Third Solution -- Apply to Asynch jobs execution第三种解决方案——应用于异步作业执行

Without any abstraction it would be something like this "USING CDK"没有任何抽象，它会是这样的“使用 CDK”

But I also created an abstraction "ParallelStateMachineCatch" so you can use just like this:但我还创建了一个抽象“ParallelStateMachineCatch”，因此您可以像这样使用：

解决方案1
0 2021-06-07 14:44:25