Kubernetes Job Cleanup

Question

From what I understand the Job object is supposed to reap pods after a certain amount of time. But on my GKE cluster (Kubernetes 1.1.8) it seems that "kubectl get pods -a" can list pods from days ago.

All were created using the Jobs API.

I did notice that after delete the job with kubectl delete jobs The pods were deleted too.

My main concern here is that I am going to run thousands and tens of thousands of pods on the cluster in batch jobs, and don't want to overload the internal backlog system.

Answer 1

It looks like starting with Kubernetes 1.6 (and the v2alpha1 api version), if you're using cronjobs to create the jobs (that, in turn, create your pods), you'll be able to limit how many old jobs are kept. Just add the following to your job spec:

successfulJobsHistoryLimit: X
failedJobsHistoryLimit: Y

Where X and Y are the limits of how many previously run jobs the system should keep around (it keeps jobs around indefinitely by default [at least on version 1.5.])

Edit 2018-09-29 :

For newer K8S versions, updated links with documentation for this are here:

Answer 2

It's true that you used to have to delete jobs manually. @puja's answer was correct at the time of writing.

Kubernetes 1.12.0 released a TTL feature (in alpha) where you can set it to automatically clean up jobs a specified number of seconds after completion ( changelog ). You can set it to zero for immediate cleanup. See the Jobs docs .

Example from the doc:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi-with-ttl
spec:
  ttlSecondsAfterFinished: 100
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never

Answer 3

I recently built a kubernetes-operator to do this task.

After deploy it will monitor selected namespace and delete completed jobs/pods if they completed without errors/restarts.

https://github.com/lwolf/kube-cleanup-operator

Answer 4

This is the intended behaviour of Jobs even in Kubernetes 1.3. Both the job and its pods stay in the system until you delete them manually. This is to provide you with a way to see results of the pods (ie through logs) that were not transported outside by some mechanism already or check for errors, warnings, or other diagnostic output.

The recommended/ official way to get rid of the pods is to delete the job as you mentioned above. Using the garbage collector would only delete the pods, but the job itself would still be in the system.

If you don't want to delete the job manually, you could write a little script that is running in your cluster and checks for completed jobs and deletes them. Sadly, Scheduled Jobs are only coming in 1.4 but you could run the script in a normal pod instead.

Answer 5

在 kubernetes v1.2 中，有一个垃圾收集器用于回收具有全局阈值的--terminated-pod-gc-threshold=12500 （请参阅控制器管理器中的标志。我不知道有任何用于终止 pod 的 GC 机制） v1.1.8. 你可能想运行一个script/pod来定期清理pods/jobs，防止master组件被淹没。顺便说一下，有一个open issue可以自动调整GC阈值。

Kubernetes Job Cleanup

Question

5 answers

solution1
60 2017-03-30 10:52:37

solution2
14 2019-01-18 01:10:44

solution3
5 2017-09-06 10:59:55

solution4
3 2016-07-25 06:56:07

solution5
2 2016-04-03 17:53:04

Kubernetes Job Cleanup

Question

5 answers

solution1 60 2017-03-30 10:52:37

solution2 14 2019-01-18 01:10:44

solution3 5 2017-09-06 10:59:55

solution4 3 2016-07-25 06:56:07

solution5 2 2016-04-03 17:53:04

solution1
60 2017-03-30 10:52:37

solution2
14 2019-01-18 01:10:44

solution3
5 2017-09-06 10:59:55

solution4
3 2016-07-25 06:56:07

solution5
2 2016-04-03 17:53:04