简体   繁体   English

Kubernetes 作业与容器中的 podCreating state

[英]Kubernetes job vs pod in containerCreating state

I am trying to figure out if there is a way to force a pod that is stuck on containerCreating state (for valid reasons like can't mount an inaccessible NFS, etc.) to move to a failed state after a specific amount of time.我试图弄清楚是否有一种方法可以强制卡在containerCreating state 上的 pod(出于无法安装不可访问的 NFS 等正当原因)在特定时间后移动到失败的 state。

I have Kubernetes jobs that I'm running through a Jenkins pipeline.我有 Kubernetes jobs ,我正在通过 Jenkins 管道运行这些作业。 I'm using the job state ( type: completed|failed ) to determine the outcome and then I gather the results of the jobs ( kubectl get pods + kubectl logs ).我正在使用作业 state ( type: completed|failed )来确定结果,然后收集作业的结果( kubectl get pods + kubectl logs )。 It works well as long as the pods go into a known failed state like ContainerCannotRun or Backofflimit and therefore the job state goes to failed .只要将 pod go 放入已知失败的 state(如ContainerCannotRunBackofflimit中,它就可以正常工作,因此job state 会failed

Where the problem arises is when a pod goes into containerCreating state and stays that way.问题出现的地方是当一个 pod 进入containerCreating state 并保持这种状态时。 Then, the job state stays active and will never change.然后,作业 state 保持active并且永远不会改变。 Is there a way, in the job manifest to put something to force a pod that's in containerCreating state to move to a failed state after a certain amount of time?有没有办法在job清单中放置一些东西来强制containerCreating state 在一定时间后移动到失败的 state 中的 pod?

Example: pod status示例:吊舱状态

    - image: myimage
      imageID: ""
      lastState: {}
      name: primary
      ready: false
      restartCount: 0
      state:
        waiting:
          reason: ContainerCreating
    hostIP: x.y.z.y
    phase: Pending
    qosClass: BestEffort
    startTime: "2020-05-06T17:09:58Z"

job status工作现状

    active: 1
    startTime: "2020-05-06T17:09:58Z"

Thanks for any input.感谢您的任何意见。

As documented here use activeDeadlineSeconds or backoffLimit如此所述,使用activeDeadlineSecondsbackoffLimit

The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created.无论创建了多少 Pod, activeDeadlineSeconds适用于作业的持续时间。 Once a Job reaches activeDeadlineSeconds , all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.一旦 Job 达到activeDeadlineSeconds ,其所有正在运行的 Pod 都将终止,并且 Job 状态将变为 type: Failed with reason: DeadlineExceeded。

Once backoffLimit has been reached the Job will be marked as failed and any running Pods will be terminated.一旦达到backoffLimit ,Job 将被标记为失败,并且任何正在运行的 Pod 都将被终止。

Note that a Job's activeDeadlineSeconds takes precedence over its backoffLimit .请注意,作业的activeDeadlineSeconds优先于其backoffLimit Therefore, a Job that is retrying one or more failed Pods will not deploy additional Pods once it reaches the time limit specified by activeDeadlineSeconds , even if the backoffLimit is not yet reached.因此,重试一个或多个失败 Pod 的作业一旦达到activeDeadlineSeconds指定的时间限制,就不会部署额外的 Pod,即使尚未达到backoffLimit

apiVersion: batch/v1
kind: Job
metadata:
  name: pi-with-timeout
spec:
  backoffLimit: 5
  activeDeadlineSeconds: 100
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM