简体   繁体   English

主节点上的 Kubernetes 污点,但工作节点上没有调度

[英]Kubernetes taint on master but no scheduling on worker node

I have an issue on my kubernetes (K3S) cluster :我的 kubernetes (K3S) 集群有问题:

0/4 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) had taint {k3s-controlplane: true}, that the pod didn't tolerate.

To describe how that happened, I have 4 K3S server, with 3 control-plane and 1 worker.为了描述这是如何发生的,我有 4 个 K3S 服务器,有 3 个控制平面和 1 个工作人员。

No nodes have taints, so each pod was able to schedule on any node.没有节点有污点,因此每个 pod 都可以在任何节点上进行调度。

I want to change that and taint my master nodes, so I added: Taints: k3s-controlplane=true:NoSchedule on 2 nodes我想改变它并污染我的主节点,所以我添加了:Taints: k3s-controlplane=true:NoSchedule on 2 nodes

To test it, I've restarted one deployment, and now, that pod won't schedule.为了测试它,我重新启动了一个部署,现在,该 pod 不会调度。

As I understand, it should schedule on the no tainted nodes by default, but it seems that is not the case.据我了解,默认情况下它应该安排在没有污染的节点上,但似乎并非如此。

For new deployment, it works great.对于新部署,它工作得很好。

So I guess, there is history in my deployment that crate the issue.所以我想,我的部署中有历史记录了这个问题。 The deployment is kind of simple :部署很简单:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  labels:
    app: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: test
    spec:
      nodeSelector:
        type: "slow"      
      containers:
      - env:
        - name: PUID
          value: "1000"
        - name: GUID
          value: "1000"
        - name: TZ
          value: Europe/Paris
        - name: AUTO_UPDATE
          value: "true"
        image: test/test
        imagePullPolicy: Always
        name: test
        volumeMounts:
        - mountPath: /config
          name: vol0
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "256Mi"
            cpu: "500m"
      volumes:
      - name: vol0
        persistentVolumeClaim:
          claimName: test-config-lh

Well, this particular deployment had a selector : "slow" which are the tag for these two node ....好吧,这个特定的部署有一个选择器:“slow”,它是这两个节点的标签......

If i use this command :如果我使用这个命令:

kubectl get nodes --show-labels
NAME      STATUS   ROLES                       AGE    VERSION        LABELS
baal-01   Ready    control-plane,etcd,master   276d   v1.22.5+k3s1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=k3s,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=baal-01,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=k3s,type=slow
baal-02   Ready    control-plane,etcd,master   276d   v1.22.5+k3s1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=k3s,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=baal-02,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=k3s,type=slow
lamia01   Ready    control-plane,etcd,master   187d   v1.22.5+k3s1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=k3s,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=lamia01,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=k3s,type=fast
lamia03   Ready    <none>                      186d   v1.22.5+k3s1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=k3s,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=lamia03,kubernetes.io/os=linux,node.kubernetes.io/instance-type=k3s,ram=full,type=fast

You can notice the label "type=slow" on the two nodes "baal-01" and "baal-02", and thoses two nodes have the no schedule taint.您可以注意到两个节点“baal-01”和“baal-02”上的标签“type=slow”,并且这两个节点没有调度污点。

So the deployment was trying to shcedule the pods on a node with the label "type=slow" and none of the schedulable node had this label.因此,部署试图在一个带有“type=slow”标签的节点上调度 Pod,并且没有一个可调度的节点有这个标签。

Sorry, i missed it ..对不起,我错过了。。

so no issue there ...所以那里没有问题...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM