[英]Kubernetes job vs pod in containerCreating state
I am trying to figure out if there is a way to force a pod that is stuck on containerCreating
state (for valid reasons like can't mount an inaccessible NFS, etc.) to move to a failed state after a specific amount of time.我试图弄清楚是否有一种方法可以强制卡在containerCreating
state 上的 pod(出于无法安装不可访问的 NFS 等正当原因)在特定时间后移动到失败的 state。
I have Kubernetes jobs
that I'm running through a Jenkins pipeline.我有 Kubernetes jobs
,我正在通过 Jenkins 管道运行这些作业。 I'm using the job state ( type: completed|failed
) to determine the outcome and then I gather the results of the jobs ( kubectl get pods
+ kubectl logs
).我正在使用作业 state ( type: completed|failed
)来确定结果,然后收集作业的结果( kubectl get pods
+ kubectl logs
)。 It works well as long as the pods go into a known failed state like ContainerCannotRun
or Backofflimit
and therefore the job
state goes to failed
.只要将 pod go 放入已知失败的 state(如ContainerCannotRun
或Backofflimit
中,它就可以正常工作,因此job
state 会failed
。
Where the problem arises is when a pod goes into containerCreating
state and stays that way.问题出现的地方是当一个 pod 进入containerCreating
state 并保持这种状态时。 Then, the job state stays active
and will never change.然后,作业 state 保持active
并且永远不会改变。 Is there a way, in the job
manifest to put something to force a pod that's in containerCreating
state to move to a failed state after a certain amount of time?有没有办法在job
清单中放置一些东西来强制containerCreating
state 在一定时间后移动到失败的 state 中的 pod?
Example: pod status示例:吊舱状态
- image: myimage
imageID: ""
lastState: {}
name: primary
ready: false
restartCount: 0
state:
waiting:
reason: ContainerCreating
hostIP: x.y.z.y
phase: Pending
qosClass: BestEffort
startTime: "2020-05-06T17:09:58Z"
job status工作现状
active: 1
startTime: "2020-05-06T17:09:58Z"
Thanks for any input.感谢您的任何意见。
As documented here use activeDeadlineSeconds
or backoffLimit
如此处所述,使用activeDeadlineSeconds
或backoffLimit
The activeDeadlineSeconds
applies to the duration of the job, no matter how many Pods are created.无论创建了多少 Pod, activeDeadlineSeconds
适用于作业的持续时间。 Once a Job reaches activeDeadlineSeconds
, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.一旦 Job 达到activeDeadlineSeconds
,其所有正在运行的 Pod 都将终止,并且 Job 状态将变为 type: Failed with reason: DeadlineExceeded。
Once backoffLimit
has been reached the Job will be marked as failed and any running Pods will be terminated.一旦达到backoffLimit
,Job 将被标记为失败,并且任何正在运行的 Pod 都将被终止。
Note that a Job's activeDeadlineSeconds
takes precedence over its backoffLimit
.请注意,作业的activeDeadlineSeconds
优先于其backoffLimit
。 Therefore, a Job that is retrying one or more failed Pods will not deploy additional Pods once it reaches the time limit specified by activeDeadlineSeconds
, even if the backoffLimit
is not yet reached.因此,重试一个或多个失败 Pod 的作业一旦达到activeDeadlineSeconds
指定的时间限制,就不会部署额外的 Pod,即使尚未达到backoffLimit
。
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-timeout
spec:
backoffLimit: 5
activeDeadlineSeconds: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.