简体   繁体   中英

kubernetes pending pod priority

I have the following pods on my kubernetes (1.18.3) cluster:

NAME      READY   STATUS    RESTARTS   AGE
pod1      1/1     Running   0          14m
pod2      1/1     Running   0          14m
pod3      0/1     Pending   0          14m
pod4      0/1     Pending   0          14m

pod3 and pod4 cannot start because the node has capacity for 2 pods only. When pod1 finishes and quits, then the scheduler picks either pod3 or pod4 and starts it. So far so good.

However, I also have a high priority pod (hpod) that I'd like to start before pod3 or pod4 when either of the running pods finishes and quits.

So I created a priorityclass can be found in the kubernetes docs:

kind: PriorityClass
metadata:
  name: high-priority-no-preemption
value: 1000000
preemptionPolicy: Never
globalDefault: false
description: "This priority class should be used for XYZ service pods only."

I've created the following pod yaml:

apiVersion: v1
kind: Pod
metadata:
  name: hpod
  labels:
    app: hpod
spec:
  containers:
  - name: hpod
    image: ...
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "500m"
        memory: "500Mi"
  priorityClassName: high-priority-no-preemption

Now the problem is that when I start the high prio pod with kubectl apply -f hpod.yaml, then the scheduler terminates a running pod to allow the high priority pod to start despite I've set 'preemptionPolicy: Never'.

The expected behaviour would be to postpone starting hpod until a currently running pod finishes. And when it does, then let hpod start before pod3 or pod4.

What am I doing wrong?

Prerequisites:

This solution was tested on Kubernetes v1.18.3 , docker 19.03 and Ubuntu 18. Also text editor is required (ie sudo apt-get install vim ).

In Kubernetes documentation under How to disable preemption you can find Note :

Note: In Kubernetes 1.15 and later, if the feature NonPreemptingPriority is enabled , PriorityClasses have the option to set preemptionPolicy: Never . This will prevent pods of that PriorityClass from preempting other pods.

Also under Non-preempting PriorityClass you have information:

The use of the PreemptionPolicy field requires the NonPreemptingPriority feature gate to be enabled .

Later if you will check thoses Feature Gates info, you will find that NonPreemptingPriority is false , so as default it's disabled.

Output with your current configuration:

$ kubectl get pods
NAME             READY   STATUS    RESTARTS   AGE
nginx-normal     1/1     Running   0          32s
nginx-normal-2   1/1     Running   0          32s
$ kubectl apply -f prio.yaml
pod/nginx-priority created$ kubectl get pods
NAME             READY   STATUS    RESTARTS   AGE
nginx-normal-2   1/1     Running   0          48s
nginx-priority   1/1     Running   0          8s

To enable preemptionPolicy: Never you need to apply --feature-gates=NonPreemptingPriority=true to 3 files:

/etc/kubernetes/manifests/kube-apiserver.yaml

/etc/kubernetes/manifests/kube-controller-manager.yaml

/etc/kubernetes/manifests/kube-scheduler.yaml

To check if this feature-gate is enabled you can check by using commands:

ps aux | grep apiserver | grep feature-gates
ps aux | grep scheduler | grep feature-gates
ps aux | grep controller-manager | grep feature-gates

For quite detailed information, why you have to edit thoses files please check this Github thread .

$ sudo su
# cd /etc/kubernetes/manifests/
# ls
etcd.yaml  kube-apiserver.yaml  kube-controller-manager.yaml  kube-scheduler.yaml

Use your text editor to add feature gate to those files

# vi kube-apiserver.yaml

and add - --feature-gates=NonPreemptingPriority=true under spec.containers.command like in example bellow:

spec:
  containers:
  - command:
    - kube-apiserver
    - --feature-gates=NonPreemptingPriority=true
    - --advertise-address=10.154.0.31

And do the same with 2 other files. After that you can check if this flags were applied.

$ ps aux | grep apiserver | grep feature-gates
root     26713 10.4  5.2 565416 402252 ?       Ssl  14:50   0:17 kube-apiserver --feature-gates=NonPreemptingPriority=true --advertise-address=10.154.0.31 

Now you have redeploy your PriorityClass .

$ kubectl get priorityclass
NAME                          VALUE        GLOBAL-DEFAULT   AGE
high-priority-no-preemption   1000000      false            12m
system-cluster-critical       2000000000   false            23m
system-node-critical          2000001000   false            23m
$ kubectl delete priorityclass high-priority-no-preemption
priorityclass.scheduling.k8s.io "high-priority-no-preemption" deleted
$ kubectl apply -f class.yaml 
priorityclass.scheduling.k8s.io/high-priority-no-preemption created

Last step is to deploy pod with this PriorityClass .

TEST

$ kubectl get po
NAME             READY   STATUS    RESTARTS   AGE
nginx-normal     1/1     Running   0          4m4s
nginx-normal-2   1/1     Running   0          18m
$ kubectl apply -f prio.yaml 
pod/nginx-priority created
$ kubectl get po
NAME             READY   STATUS    RESTARTS   AGE
nginx-normal     1/1     Running   0          5m17s
nginx-normal-2   1/1     Running   0          20m
nginx-priority   0/1     Pending   0          67s
$ kubectl delete po nginx-normal-2
pod "nginx-normal-2" deleted
$ kubectl get po
NAME             READY   STATUS    RESTARTS   AGE
nginx-normal     1/1     Running   0          5m55s
nginx-priority   1/1     Running   0          105s

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM