I have a AWS Step Function which is configured to run a Fargate task, wait for completion and do some other work. The Fargate task is a long running process which can potentially get stuck during execution. To avoid this, I have configured a TimeoutSeconds parameter in the task definition:
StartAt: FargateWorker
States:
FargateWorker:
Type: Task
Resource: arn:aws:states:::ecs:runTask.waitForTaskToken
InputPath: $
ResultPath: $.workerResult
OutputPath: $
TimeoutSeconds: 3
Parameters:
Cluster: "#{EcsCluster}"
TaskDefinition: "#{EcsTaskDefinition}"
LaunchType: FARGATE
EnableExecuteCommand: true
NetworkConfiguration:
AwsvpcConfiguration:
Subnets:
- xxx
- yyy
- zzz
AssignPublicIp: DISABLED
Overrides:
ContainerOverrides:
- Name: container-${env:STACK_NAME}
Environment:
- Name: TASK_TOKEN
"Value.$": $$.Task.Token
Catch:
- ErrorEquals: ["States.ALL"]
Next: CatchAllFallback
Next: Done
I can see the state machine correctly moves to the CatchAllFallback
state after TimeoutSeconds
are passed, but the problem is that the Fargate container is still running, the state machine doesn't kill it. I need the container to be killed when the timeout triggers, so I don't end having a lot of zombie containers running until manual intervention. Is this something that can be addressed automatically by AWS in some way? Or any other solution?
One way to handle it would be specifically catch the timeout and run a step to kill the Fargate Task like so?
// Kill Task Lambda. Reference from [AWS Docs][1]
var params = {
task: 'STRING_VALUE', /* required */
cluster: 'STRING_VALUE',
reason: 'STRING_VALUE'
};
ecs.stopTask(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
# State Machine
StartAt: FargateWorker
States:
FargateWorker:
Type: Task
Resource: arn:aws:states:::ecs:runTask.waitForTaskToken
InputPath: $
ResultPath: $.workerResult
OutputPath: $
TimeoutSeconds: 3
Parameters:
Cluster: "#{EcsCluster}"
TaskDefinition: "#{EcsTaskDefinition}"
LaunchType: FARGATE
EnableExecuteCommand: true
NetworkConfiguration:
AwsvpcConfiguration:
Subnets:
- xxx
- yyy
- zzz
AssignPublicIp: DISABLED
Overrides:
ContainerOverrides:
- Name: container-${env:STACK_NAME}
Environment:
- Name: TASK_TOKEN
"Value.$": $$.Task.Token
Catch:
- ErrorEquals: ["States.Timeout"]
Next: StopTimedOutTask
Next: Done
StopTimedOutTask:
Type: Task
Resource:
Fn::GetAtt:
- initializer
- Arn
ResultPath: $.filesInfo
Next: ArchiveTransformAndSave
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.