GKE Kubernetes节点池升级非常慢

Question

I am experimenting with GKE cluster upgrades in a 6 nodes (in two node pools) test cluster before I try it on our staging or production cluster. 我先在6个节点（在两个节点池中）的测试群集中尝试GKE群集升级，然后再在暂存或生产群集中尝试。 Upgrading when I only had a 12 replicas nginx deployment, the nginx ingress controller and cert-manager (as helm chart) installed took 10 minutes per node pool (3 nodes). 当我只有12个副本nginx部署时进行升级，每个节点池（3个节点）安装的nginx入口控制器和cert-manager（如头盔图）花费了10分钟。 I was very satisfied. 我感到非常满意。 I decided to try again with something that looks more like our setup. 我决定尝试使用更像我们的设置的东西。 I removed the nginx deploy and added 2 node.js deployments, the following helm charts: mongodb-0.4.27, mcrouter-0.1.0 (as a statefulset), redis-ha-2.0.0, and my own www-redirect-0.0.1 chart (simple nginx which does redirect). 我删除了nginx部署，并添加了2个node.js部署，以下掌舵图：mongodb-0.4.27，mcrouter-0.1.0（作为状态集），redis-ha-2.0.0和我自己的www-redirect- 0.0.1图表（做重定向的简单nginx）。 The problem seems to be with mcrouter. 问题似乎出在微电脑上。 Once the node starts draining, the status of that node changes to Ready,SchedulingDisabled (which seems normal) but the following pods remains: 一旦节点开始排空，该节点的状态将更改为Ready,SchedulingDisabled （这看起来很正常），但以下窗格仍然存在：

mcrouter-memcached-0 mcrouter-memcached的-0
fluentd-gcp-v2.0.9-4f87t fluentd-GCP-v2.0.9-4f87t
kube-proxy-gke-test-upgrade-cluster-default-pool-74f8edac-wblf KUBE-代理GKE测试升级支持群集的默认池74f8edac-wblf

I do not know why those two kube-system pods remains, but that mcrouter is mine and it won't go quickly enough. 我不知道为什么仍然保留了这两个kube系统吊舱，但是那个微型计算机是我的，并且运行得不够快。 If I wait long enough (1 hour+) then it eventually work, I am not sure why. 如果我等待足够长的时间（超过1小时），那么它最终会起作用，我不确定为什么。 The current node pool (of 3 nodes) started upgrading 2h46 minutes ago and 2 nodes are upgraded, the 3rd one is still upgrading but nothing is moving... I presume it will complete in the next 1-2 hours... I tried to run the drain command with --ignore-daemonsets --force but it told me it was already drained. 当前的节点池（共3个节点）在2h46分钟前开始升级，并且2个节点已升级，第3个节点池仍在升级，但什么都没有动...我想它将在接下来的1-2小时内完成...用--ignore-daemonsets --force来运行命令--ignore-daemonsets --force但是它告诉我已经被耗尽了。 I tried to delete the pods, but they just come back and the upgrade does not move any faster. 我试图删除吊舱，但它们只是回来了，因此升级速度不会更快。 Any thoughts? 有什么想法吗？

Update #1 更新＃1

The mcrouter helm chart was installed like this: mcrouter掌舵表的安装方式如下：

helm install stable/mcrouter --name mcrouter --set controller=statefulset

The statefulsets it created for mcrouter part is: 它为微机部分创建的状态集是：

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  labels:
    app: mcrouter-mcrouter
    chart: mcrouter-0.1.0
    heritage: Tiller
    release: mcrouter
  name: mcrouter-mcrouter
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: mcrouter-mcrouter
      chart: mcrouter-0.1.0
      heritage: Tiller
      release: mcrouter
  serviceName: mcrouter-mcrouter
  template:
    metadata:
      labels:
        app: mcrouter-mcrouter
        chart: mcrouter-0.1.0
        heritage: Tiller
        release: mcrouter
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: mcrouter-mcrouter
                release: mcrouter
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -p 5000
        - --config-file=/etc/mcrouter/config.json
        command:
        - mcrouter
        image: jphalip/mcrouter:0.36.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: mcrouter-port
          timeoutSeconds: 5
        name: mcrouter-mcrouter
        ports:
        - containerPort: 5000
          name: mcrouter-port
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: mcrouter-port
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 256m
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/mcrouter
          name: config
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: mcrouter-mcrouter
        name: config
  updateStrategy:
    type: OnDelete

and here is the memcached statefulset: 这是memcached statefulset：

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  labels:
    app: mcrouter-memcached
    chart: memcached-1.2.1
    heritage: Tiller
    release: mcrouter
  name: mcrouter-memcached
spec:
  podManagementPolicy: OrderedReady
  replicas: 5
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: mcrouter-memcached
      chart: memcached-1.2.1
      heritage: Tiller
      release: mcrouter
  serviceName: mcrouter-memcached
  template:
    metadata:
      labels:
        app: mcrouter-memcached
        chart: memcached-1.2.1
        heritage: Tiller
        release: mcrouter
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: mcrouter-memcached
                release: mcrouter
            topologyKey: kubernetes.io/hostname
      containers:
      - command:
        - memcached
        - -m 64
        - -o
        - modern
        - -v
        image: memcached:1.4.36-alpine
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: memcache
          timeoutSeconds: 5
        name: mcrouter-memcached
        ports:
        - containerPort: 11211
          name: memcache
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: memcache
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 50m
            memory: 64Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  updateStrategy:
    type: OnDelete
status:
  replicas: 0

Answer 1

That is a bit complex question and I am definitely not sure that it is like how I thinking, but... Let's try to understand what is happening. 这是一个有点复杂的问题，我绝对不确定这是我的想法，但是...让我们尝试了解正在发生的事情。

You have an upgrade process and have 6 nodes in the cluster. 您有一个升级过程，并且集群中有6个节点。 The system will upgrade it one by one using Drain to remove all workload from the pod. 系统将使用Drain逐个升级它，以从Pod中删除所有工作负载。

Drain process itself respecting your settings and number of replicas and desired state of workload has higher priority than the drain of the node itself. 排空过程本身会尊重您的设置和副本数，并且工作负载的所需状态比节点本身的排空具有更高的优先级 。

During the drain process, Kubernetes will try to schedule all your workload on resources where scheduling available. 在耗尽过程中，Kubernetes将尝试在可用的调度资源上调度所有工作负载。 Scheduling on a node which system want to drain is disabled, you can see it in its state - Ready,SchedulingDisabled . 在要耗尽系统的节点上禁用调度时，您可以看到它的状态为Ready,SchedulingDisabled 。

So, Kubernetes scheduler trying to find a right place for your workload on all available nodes. 因此，Kubernetes调度程序尝试在所有可用节点上为您的工作负载找到合适的位置。 It will wait as long as it needs to place everything you describe in a cluster configuration. 只要需要将您描述的所有内容放入群集配置中，它就会等待。

Now the most important thing. 现在最重要的是。 You set that you need replicas: 5 for your mcrouter-memcached . 您设置了需要的replicas: 5 mcrouter-memcached为replicas: 5 。 It cannot run more than one replica per node because of podAntiAffinity and a node for a running it should have enough resources for that, which is calculated using resources: block of ReplicaSet . 由于podAntiAffinity ，它不能在每个节点上运行多个副本，并且要运行该节点的节点应具有足够的资源，这可以使用resources: ReplicaSet块来ReplicaSet 。

So, I think, that your cluster just does not has enough resource for a run new replica of mcrouter-memcached on the remaining 5 nodes. 因此，我认为您的群集没有足够的资源来在其余5个节点上运行mcrouter-memcached新副本。 As an example, on the last node where a replica of it still not running, you have not enough memory because of other workloads. 例如，在最后一个仍未运行其副本的节点上，由于其他工作负载，您的内存不足。

I think if you will set replicaset for mcrouter-memcached to 4, it will solve a problem. 我认为，如果将mcrouter-memcached replicaset集设置为4，将会解决问题。 Or you can try to use a bit more powerful instances for that workload, or add one more node to the cluster, it also should help. 或者，您可以尝试使用功能更强大的实例来处理该工作负载，或者向集群添加一个以上的节点，这也应该有所帮助。

Hope I gave enough explanation of my logic, ask me if something not clear to you. 希望我对我的逻辑给出足够的解释，问我是否不清楚。 But first please try to solve an issue by provided solution:) 但是首先请尝试通过提供的解决方案解决问题：）

Answer 2

The problem was a combination of the minAvailable value from a PodDisruptionBudget (that was part of the memcached helm chart which is a dependency of the mcrouter helm chart) and the replicas value for the memcached replicaset. 问题是PodDisruptionBudget的minAvailable值（属于memcached舵图的一部分，该值是mcrouter舵图的依赖项）和memcached副本集的副本值的组合。 Both were set to 5 and therefore none of them could be deleted during the drain. 两者均设置为5，因此在耗尽期间无法删除它们。 I tried changing the minAvailable to 4 but PDB are immutable at this time . 我尝试将minAvailable更改为4，但PDB目前不可变。 What I did was remove the helm chart and replace it. 我要做的是删除头盔图表并替换它。

helm delete --purge myproxy
helm install ./charts/mcrouter-0.1.0-croy.1.tgz --name myproxy --set controller=statefulset --set memcached.replicaCount=5 --set memcached.pdbMinAvailable=4

Once that was done, I was able to get the cluster to upgrade normally. 完成此操作后，我便可以使群集正常升级。

What I should have done (but only thought about it after) was to change the replicas value to 6, this way I would not have needed to delete and replace the whole chart. 我应该做的（但之后才考虑）是将副本值更改为6，这样就无需删除和替换整个图表。

Thank you @AntonKostenko for trying to help me finding this issue. 感谢@AntonKostenko尝试帮助我发现此问题。 This issue also helped me. 这个问题对我也有帮助。 Thanks to the folks in Slack@Kubernetes , specially to Paris who tried to get my issue more visibility and the volonteers of the Kubernetes Office Hours (which happened to be yesterday , lucky me!) for also taking a look. 感谢Slack @ Kubernetes中的人们，尤其是巴黎的人们，他们试图让我的问题更具知名度，并感谢Kubernetes办公时间的支持者（碰巧是昨天，幸运的我！）。 Finally, thank you to psycotica0 from Kubernetes Canada to also give me some pointers. 最后，感谢加拿大 Kubernetes的psycotica0 ，也给了我一些建议。

GKE Kubernetes节点池升级非常慢

问题描述

Update #1 更新＃1

2 个解决方案

解决方案1
2 2018-03-21 19:39:41

解决方案2
0 已采纳 2018-03-22 23:40:33

GKE Kubernetes节点池升级非常慢

问题描述

Update #1 更新＃1

2 个解决方案

解决方案1 2 2018-03-21 19:39:41

解决方案2 0 已采纳 2018-03-22 23:40:33

解决方案1
2 2018-03-21 19:39:41

解决方案2
0 已采纳 2018-03-22 23:40:33