繁体   English   中英

AWS Steps Function:Fargate 任务超时未自动终止

[英]AWS Steps Function: timed out Fargate task not automatically killed

我有一个AWS Step Function配置为运行Fargate任务,等待完成并做一些其他工作。 Fargate 任务是一个长时间运行的进程,在执行过程中可能会卡住。 为了避免这种情况,我在任务定义中配置了TimeoutSeconds参数:

StartAt: FargateWorker
States:
  FargateWorker:
    Type: Task
    Resource: arn:aws:states:::ecs:runTask.waitForTaskToken
    InputPath: $
    ResultPath: $.workerResult
    OutputPath: $
    TimeoutSeconds: 3
    Parameters:
      Cluster: "#{EcsCluster}"
      TaskDefinition: "#{EcsTaskDefinition}"
      LaunchType: FARGATE
      EnableExecuteCommand: true
      NetworkConfiguration:
        AwsvpcConfiguration:
          Subnets:
            - xxx
            - yyy
            - zzz
          AssignPublicIp: DISABLED
      Overrides:
        ContainerOverrides:
          - Name: container-${env:STACK_NAME}
            Environment:
              - Name: TASK_TOKEN
                "Value.$": $$.Task.Token
    Catch:
      - ErrorEquals: ["States.ALL"]
        Next: CatchAllFallback
    Next: Done

我可以看到 state 机器在 TimeoutSeconds 过后正确移动到CatchAllFallback TimeoutSeconds ,但问题是 Fargate 容器仍在运行,Z9ED39E2EA931586B53A985A6942 机器没有杀死它。 我需要在超时触发时杀死容器,所以在手动干预之前我不会结束运行大量僵尸容器。 这是否可以由 AWS 以某种方式自动解决? 或者任何其他解决方案?

处理它的一种方法是专门捕获超时并像这样运行一个步骤来杀死 Fargate 任务?

// Kill Task Lambda. Reference from [AWS Docs][1]
var params = {
  task: 'STRING_VALUE', /* required */
  cluster: 'STRING_VALUE',
  reason: 'STRING_VALUE'
};
ecs.stopTask(params, function(err, data) {
  if (err) console.log(err, err.stack); // an error occurred
  else     console.log(data);           // successful response
});
# State Machine
StartAt: FargateWorker
States:
  FargateWorker:
    Type: Task
    Resource: arn:aws:states:::ecs:runTask.waitForTaskToken
    InputPath: $
    ResultPath: $.workerResult
    OutputPath: $
    TimeoutSeconds: 3
    Parameters:
      Cluster: "#{EcsCluster}"
      TaskDefinition: "#{EcsTaskDefinition}"
      LaunchType: FARGATE
      EnableExecuteCommand: true
      NetworkConfiguration:
        AwsvpcConfiguration:
          Subnets:
            - xxx
            - yyy
            - zzz
          AssignPublicIp: DISABLED
      Overrides:
        ContainerOverrides:
          - Name: container-${env:STACK_NAME}
            Environment:
              - Name: TASK_TOKEN
                "Value.$": $$.Task.Token
    Catch:
      - ErrorEquals: ["States.Timeout"]
        Next: StopTimedOutTask
    Next: Done

  StopTimedOutTask:
    Type: Task
    Resource:
      Fn::GetAtt:
        - initializer
        - Arn
    ResultPath: $.filesInfo
    Next: ArchiveTransformAndSave

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM