節點池集群未自動縮放

Question

我們創建了一個 GKE 集群，並將其設置為區域 A 和 B 中的 europe-west2。該集群設置為：

節點數：1（共 2 個）自動縮放：是（每個區域 1-4 個節點）

我們正在嘗試測試自動縮放，但集群無法調度任何 pod，也沒有添加任何額外的節點。

W 2019-11-11T14:03:17Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
W 2019-11-11T14:03:20Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:51Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
I 2019-11-11T14:04:53Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached

我們有大約 80% 的 pod 無法調度，並顯示為錯誤 state。 但是我們從來沒有看到集群大小增加（不是物理的也不是水平的）。

我們從 2 節點設置開始，並進行了負載測試以使其達到最大值。 兩個節點上的 CPU 達到 100%，兩個節點上的 RAM 達到 95%。 我們收到此錯誤消息：

I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z Ensuring load balancer 
W 2019-11-11T16:01:24Z Error creating load balancer (will retry): failed to ensure load balancer for service istio-system/istio-ingressgateway: failed to ensure a static IP for load balancer (a72c616b7f5cf11e9b4694201ac10480(istio-system/istio-ingressgateway)): error getting static IP address: googleapi: Error 404: The resource 'projects/gc-lotto-stage/regions/europe-west2/addresses/a72c616b7f5cf11e9b4694201ac10480' was not found, notFound 
W 2019-11-11T16:01:25Z missing request for cpu 
W 2019-11-11T16:01:25Z missing request for cpu 
W 2019-11-11T16:01:26Z missing request for cpu 
I 2019-11-11T16:01:31Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
W 2019-11-11T16:01:35Z missing request for cpu 
W 2019-11-11T16:01:44Z 0/2 nodes are available: 2 Insufficient cpu. 
W 2019-11-11T16:01:44Z 0/2 nodes are available: 2 Insufficient cpu.

Answer 1

它還取決於配置的節點大小：

首先查看節點可分配資源：

Kubectl describe node <node>
Allocatable:
  cpu:                4
  ephemeral-storage:  17784772Ki
  hugepages-2Mi:      0
  memory:             4034816Ki
  pods:               110

還要檢查已經分配的資源：

Allocated resources:
  Kubectl describe node <node>
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1505m (37%)   3 (75%)
  memory             2750Mi (69%)  6484Mi (164%)
  ephemeral-storage  0 (0%)        0 (0%)

然后查看資源請求：

如果 CPU 請求/內存請求超過節點可分配資源，則節點可能不會自動縮放。 節點有足夠的容量來處理 pod 請求。

理想情況下，可分配資源小於實際容量，因為系統會將容量的一部分分配給系統守護進程。

Answer 2

我有一段時間遇到同樣的問題，經過大量研究和跟蹤后發現，如果你想在 GKE 中實現集群自動擴展，你必須記住一些事情。

為每個可能的工作負載設置資源請求和限制
自動縮放可根據請求而不是限制工作。 因此，如果您的工作負載的所有請求的總和超過節點池中可用的總資源，那么您將看到它在擴展。

這對我有用。

希望能幫助到你。

節點池集群未自動縮放

問題描述

2 個解決方案

解決方案1
1 2020-01-22 06:07:41

解決方案2
0 2020-01-22 04:31:24

節點池集群未自動縮放

問題描述

2 個解決方案

解決方案1 1 2020-01-22 06:07:41

解決方案2 0 2020-01-22 04:31:24

解決方案1
1 2020-01-22 06:07:41

解決方案2
0 2020-01-22 04:31:24