简体繁体中英

Pod time-limit in Kubernetes job — .spec.activeDeadlineSeconds per pod

原文 2020-04-28 14:30:17 7 1 kubernetes/ containers/ kubectl

As explained in the Kuberenetes docs on the topic of jobs:

The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds , all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded .

However, what I want to do is limit the time of each pod . If a pod takes too long, I want it to fail, but I want the other pods to continue, and for the job to create more pods if necessary.

I'll explain a bit about my task, just to make the problem crystal clear. The job consists of taking items from a Redis database, where the database serves as a sort of queue. Each pod processes one item (well, the number might vary). If a pod takes too long processing an item, I want it to fail. However, the other pods should continue, and the job should continue creating pods and retrieving more items from the database.

1 answers

Your use case seems identical to this example from the kubernetes docs.
As you said, activeDeadlineSeconds is not the parameter you should be using here.

I'm not sure why do you want the pod to fail if it can't process an item in a given time frame. I see a few different approaches that you can take here, but more info on the nature of you problem is required to know which one to take. One approach for solving your issue would be setting the job parallelism to the number of pods you'd like to run concurrently and set this behaviour in the code itself -

If the issue delaying the processing is transient, you would probably want to terminate the current transaction, keep the item in your queue and restart handling the same item
If the same item has failed x times, it should be removed from the queue and pushed to some kind of dead letter queue to await troubleshooting at a later point in time

Another approach would be to fanning out the messages in the queue in a way that will spawn a worker pod for each message, same as this example depicts.
Choosing this solution will indeed cause every pod taking too long to process the item to fail, and if you set the restartPolicy of the pods you create to never you should have a list of failed pods on your hands that correspond to the number of failed processed items.

Having said all that, I don't failing the pods is the right approach here, and that keeping track of failed processing events should be done using instrumentation, either by container logs or metrics.

Creating Kubernetes Pod per Kubernetes Job and Cleanup

One pod/job per kubernetes node

Kubernetes hidden 110 pod per node limit?

Kubernetes delete pod job

Timeout a pod/job in kubernetes

What time is it in a Kubernetes pod?

Create Azure Kubernetes ingress controller to limit 1 connection per pod

Kubernetes Job: exactly one pod

Kubernetes - connect from Job to Pod

Kubernetes - not unique ip per pod

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Creating Kubernetes Pod per Kubernetes Job and Cleanup One pod/job per kubernetes node Kubernetes hidden 110 pod per node limit? Kubernetes delete pod job Timeout a pod/job in kubernetes What time is it in a Kubernetes pod? Create Azure Kubernetes ingress controller to limit 1 connection per pod Kubernetes Job: exactly one pod Kubernetes - connect from Job to Pod Kubernetes - not unique ip per pod

Related Tags

Pod time-limit in Kubernetes job — .spec.activeDeadlineSeconds per pod

Question

1 answers

solution1 2 ACCPTED 2020-04-28 17:39:35

solution1
2 ACCPTED 2020-04-28 17:39:35