Kubernetes 作业与容器中的 podCreating state

Question

I am trying to figure out if there is a way to force a pod that is stuck on containerCreating state (for valid reasons like can't mount an inaccessible NFS, etc.) to move to a failed state after a specific amount of time.我试图弄清楚是否有一种方法可以强制卡在containerCreating state 上的 pod（出于无法安装不可访问的 NFS 等正当原因）在特定时间后移动到失败的 state。

I have Kubernetes jobs that I'm running through a Jenkins pipeline.我有 Kubernetes jobs ，我正在通过 Jenkins 管道运行这些作业。 I'm using the job state ( type: completed|failed ) to determine the outcome and then I gather the results of the jobs ( kubectl get pods + kubectl logs ).我正在使用作业 state （ type: completed|failed ）来确定结果，然后收集作业的结果（ kubectl get pods + kubectl logs ）。 It works well as long as the pods go into a known failed state like ContainerCannotRun or Backofflimit and therefore the job state goes to failed .只要将 pod go 放入已知失败的 state（如ContainerCannotRun或Backofflimit中，它就可以正常工作，因此job state 会failed 。

Where the problem arises is when a pod goes into containerCreating state and stays that way.问题出现的地方是当一个 pod 进入containerCreating state 并保持这种状态时。 Then, the job state stays active and will never change.然后，作业 state 保持active并且永远不会改变。 Is there a way, in the job manifest to put something to force a pod that's in containerCreating state to move to a failed state after a certain amount of time?有没有办法在job清单中放置一些东西来强制containerCreating state 在一定时间后移动到失败的 state 中的 pod？

Example: pod status示例：吊舱状态

    - image: myimage
      imageID: ""
      lastState: {}
      name: primary
      ready: false
      restartCount: 0
      state:
        waiting:
          reason: ContainerCreating
    hostIP: x.y.z.y
    phase: Pending
    qosClass: BestEffort
    startTime: "2020-05-06T17:09:58Z"

job status工作现状

    active: 1
    startTime: "2020-05-06T17:09:58Z"

Thanks for any input.感谢您的任何意见。

Answer 1

As documented here use activeDeadlineSeconds or backoffLimit如此处所述，使用activeDeadlineSeconds或backoffLimit

The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created.无论创建了多少 Pod， activeDeadlineSeconds适用于作业的持续时间。 Once a Job reaches activeDeadlineSeconds , all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.一旦 Job 达到activeDeadlineSeconds ，其所有正在运行的 Pod 都将终止，并且 Job 状态将变为 type: Failed with reason: DeadlineExceeded。

Once backoffLimit has been reached the Job will be marked as failed and any running Pods will be terminated.一旦达到backoffLimit ，Job 将被标记为失败，并且任何正在运行的 Pod 都将被终止。

Note that a Job's activeDeadlineSeconds takes precedence over its backoffLimit .请注意，作业的activeDeadlineSeconds优先于其backoffLimit 。 Therefore, a Job that is retrying one or more failed Pods will not deploy additional Pods once it reaches the time limit specified by activeDeadlineSeconds , even if the backoffLimit is not yet reached.因此，重试一个或多个失败 Pod 的作业一旦达到activeDeadlineSeconds指定的时间限制，就不会部署额外的 Pod，即使尚未达到backoffLimit 。

apiVersion: batch/v1
kind: Job
metadata:
  name: pi-with-timeout
spec:
  backoffLimit: 5
  activeDeadlineSeconds: 100
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never

Kubernetes 作业与容器中的 podCreating state

问题描述

1 个解决方案

解决方案1
1 2020-05-07 03:47:11

Kubernetes 作业与容器中的 podCreating state

问题描述

1 个解决方案

解决方案1 1 2020-05-07 03:47:11

解决方案1
1 2020-05-07 03:47:11