避免 kubernetes 調度程序在 kubernetes 集群的單個節點中運行所有 pod

Question

我有一個帶有 4 個節點和一個主節點的 kubernetes 集群。 我正在嘗試在所有節點中運行 5 個 nginx pod。 目前，調度程序有時在一台機器上運行所有 pod，有時在不同的機器上運行。

如果我的節點出現故障並且我的所有 Pod 都在同一個節點上運行，會發生什么？ 我們需要避免這種情況。

如何強制調度程序以循環方式在節點上運行 pod，以便如果任何節點宕機，那么至少有一個節點應該讓 NGINX pod 處於運行模式。

這可能嗎？ 如果可能，我們如何實現這種場景？

Answer 1

使用 podAntiAfinity

帶有requiredDuringSchedulingIgnoredDuringExecution的 podAntiAfinity 可用於防止將同一個 pod 調度到同一個主機名。 如果更喜歡更寬松的約束，請使用preferredDuringSchedulingIgnoredDuringExecution 。

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 5
  template:
    metadata:
      labels:                                            
        app: nginx                                   
    spec:
      affinity:
        podAntiAffinity:                                 
          requiredDuringSchedulingIgnoredDuringExecution:   <---- hard requirement not to schedule "nginx" pod if already one scheduled.
          - topologyKey: kubernetes.io/hostname     <---- Anti affinity scope is host     
            labelSelector:                               
              matchLabels:                               
                app: nginx        
      container:
        image: nginx:latest

Kubelet --max-pods

您可以在 kubelet 配置中指定節點的最大 Pod 數，以便在節點宕機的情況下，它會防止 K8S 用來自故障節點的 Pod 使另一個節點飽和。

Answer 2

我認為 pod 間反親和性功能會對您有所幫助。 Pod 間反關聯性允許您根據已在節點上運行的 Pod 上的標簽來限制您的 Pod 有資格在哪些節點上進行調度。 這是一個例子。

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    run: nginx-service
  name: nginx-service
spec:
  replicas: 3
  selector:
    matchLabels:
      run: nginx-service
  template:
    metadata:
      labels:
        service-type: nginx
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: service-type
                operator: In
                values:
                - nginx
            topologyKey: kubernetes.io/hostname
      containers:
      - name: nginx-service
        image: nginx:latest

注意：我在這里使用preferredDuringSchedulingIgnoredDuringExecution ，因為 pod 比節點多。

更詳細的信息可以參考以下鏈接的Inter-podaffinity和anti-affinity（beta特性）部分： https : //kubernetes.io/docs/concepts/configuration/assign-pod-node/

Answer 3

使用 Pod 拓撲擴展約束

從 2021 年開始，（v1.19 及更高版本）您可以默認使用Pod Topology Spread Constraints topologySpreadConstraints ，我發現它比podAntiAfinity更適合這種情況。

主要區別在於 Anti-affinity 只能限制每個節點一個 pod，而 Pod Topology Spread Constraints 可以限制每個節點有 N 個 pod。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-example-deployment
spec:
  replicas: 6
  selector:
    matchLabels:
      app: nginx-example
  template:
    metadata:
      labels:
        app: nginx-example
    spec:
      containers:
      - name: nginx
        image: nginx:latest
      # This sets how evenly spread the pods
      # For example, if there are 3 nodes available,
      # 2 pods are scheduled for each node.
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: nginx-example

有關更多詳細信息，請參閱KEP-895和官方博客文章。

Answer 4

如果您的容器為它們需要的內存和 CPU 量指定資源請求，調度程序應該分散您的 pod。 請參閱http://kubernetes.io/docs/user-guide/compute-resources/

Answer 5

我們可以使用 Taint 或 toleration 來避免將 Pod 部署到節點中或不部署到節點中。


Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.

示例部署 yaml 將類似於

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    run: nginx-service
  name: nginx-service
spec:
  replicas: 3
  selector:
    matchLabels:
      run: nginx-service
  template:
    metadata:
      labels:
        service-type: nginx
    spec:
      containers:
      - name: nginx-service
        image: nginx:latest
      tolerations:
      - key: "key1"
        operator: "Equal"
        value: "value1"
        effect: "NoSchedule"

您可以在https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#:~:text=Node%20affinity%2C%20is%20a%20property,onto%20nodes%找到更多信息20with%20matching%20taints 。

避免 kubernetes 調度程序在 kubernetes 集群的單個節點中運行所有 pod

問題描述

5 個解決方案

解決方案1
19 2018-03-18 10:59:47

使用 podAntiAfinity

Kubelet --max-pods

解決方案2
10 2017-04-01 05:17:01

解決方案3
2 2021-01-18 07:48:19

使用 Pod 拓撲擴展約束

解決方案4
0 2016-06-14 07:11:53

解決方案5
0 2021-05-04 04:52:54

避免 kubernetes 調度程序在 kubernetes 集群的單個節點中運行所有 pod

問題描述

5 個解決方案

解決方案1 19 2018-03-18 10:59:47

使用 podAntiAfinity

Kubelet --max-pods

解決方案2 10 2017-04-01 05:17:01

解決方案3 2 2021-01-18 07:48:19

使用 Pod 拓撲擴展約束

解決方案4 0 2016-06-14 07:11:53

解決方案5 0 2021-05-04 04:52:54

解決方案1
19 2018-03-18 10:59:47

解決方案2
10 2017-04-01 05:17:01

解決方案3
2 2021-01-18 07:48:19

解決方案4
0 2016-06-14 07:11:53

解決方案5
0 2021-05-04 04:52:54