简体   繁体   English

AWS Steps Function:Fargate 任务超时未自动终止

[英]AWS Steps Function: timed out Fargate task not automatically killed

I have a AWS Step Function which is configured to run a Fargate task, wait for completion and do some other work.我有一个AWS Step Function配置为运行Fargate任务,等待完成并做一些其他工作。 The Fargate task is a long running process which can potentially get stuck during execution. Fargate 任务是一个长时间运行的进程,在执行过程中可能会卡住。 To avoid this, I have configured a TimeoutSeconds parameter in the task definition:为了避免这种情况,我在任务定义中配置了TimeoutSeconds参数:

StartAt: FargateWorker
States:
  FargateWorker:
    Type: Task
    Resource: arn:aws:states:::ecs:runTask.waitForTaskToken
    InputPath: $
    ResultPath: $.workerResult
    OutputPath: $
    TimeoutSeconds: 3
    Parameters:
      Cluster: "#{EcsCluster}"
      TaskDefinition: "#{EcsTaskDefinition}"
      LaunchType: FARGATE
      EnableExecuteCommand: true
      NetworkConfiguration:
        AwsvpcConfiguration:
          Subnets:
            - xxx
            - yyy
            - zzz
          AssignPublicIp: DISABLED
      Overrides:
        ContainerOverrides:
          - Name: container-${env:STACK_NAME}
            Environment:
              - Name: TASK_TOKEN
                "Value.$": $$.Task.Token
    Catch:
      - ErrorEquals: ["States.ALL"]
        Next: CatchAllFallback
    Next: Done

I can see the state machine correctly moves to the CatchAllFallback state after TimeoutSeconds are passed, but the problem is that the Fargate container is still running, the state machine doesn't kill it.我可以看到 state 机器在 TimeoutSeconds 过后正确移动到CatchAllFallback TimeoutSeconds ,但问题是 Fargate 容器仍在运行,Z9ED39E2EA931586B53A985A6942 机器没有杀死它。 I need the container to be killed when the timeout triggers, so I don't end having a lot of zombie containers running until manual intervention.我需要在超时触发时杀死容器,所以在手动干预之前我不会结束运行大量僵尸容器。 Is this something that can be addressed automatically by AWS in some way?这是否可以由 AWS 以某种方式自动解决? Or any other solution?或者任何其他解决方案?

One way to handle it would be specifically catch the timeout and run a step to kill the Fargate Task like so?处理它的一种方法是专门捕获超时并像这样运行一个步骤来杀死 Fargate 任务?

// Kill Task Lambda. Reference from [AWS Docs][1]
var params = {
  task: 'STRING_VALUE', /* required */
  cluster: 'STRING_VALUE',
  reason: 'STRING_VALUE'
};
ecs.stopTask(params, function(err, data) {
  if (err) console.log(err, err.stack); // an error occurred
  else     console.log(data);           // successful response
});
# State Machine
StartAt: FargateWorker
States:
  FargateWorker:
    Type: Task
    Resource: arn:aws:states:::ecs:runTask.waitForTaskToken
    InputPath: $
    ResultPath: $.workerResult
    OutputPath: $
    TimeoutSeconds: 3
    Parameters:
      Cluster: "#{EcsCluster}"
      TaskDefinition: "#{EcsTaskDefinition}"
      LaunchType: FARGATE
      EnableExecuteCommand: true
      NetworkConfiguration:
        AwsvpcConfiguration:
          Subnets:
            - xxx
            - yyy
            - zzz
          AssignPublicIp: DISABLED
      Overrides:
        ContainerOverrides:
          - Name: container-${env:STACK_NAME}
            Environment:
              - Name: TASK_TOKEN
                "Value.$": $$.Task.Token
    Catch:
      - ErrorEquals: ["States.Timeout"]
        Next: StopTimedOutTask
    Next: Done

  StopTimedOutTask:
    Type: Task
    Resource:
      Fn::GetAtt:
        - initializer
        - Arn
    ResultPath: $.filesInfo
    Next: ArchiveTransformAndSave

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM