簡體   English   中英

節點池集群未自動縮放

[英]Node pool cluster is not autoscaling

我們創建了一個 GKE 集群,並將其設置為區域 A 和 B 中的 europe-west2。該集群設置為:

節點數:1(共 2 個) 自動縮放:是(每個區域 1-4 個節點)

我們正在嘗試測試自動縮放,但集群無法調度任何 pod,也沒有添加任何額外的節點。

W 2019-11-11T14:03:17Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
W 2019-11-11T14:03:20Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:51Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
I 2019-11-11T14:04:53Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached

我們有大約 80% 的 pod 無法調度,並顯示為錯誤 state。 但是我們從來沒有看到集群大小增加(不是物理的也不是水平的)。

我們從 2 節點設置開始,並進行了負載測試以使其達到最大值。 兩個節點上的 CPU 達到 100%,兩個節點上的 RAM 達到 95%。 我們收到此錯誤消息:

I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z Ensuring load balancer 
W 2019-11-11T16:01:24Z Error creating load balancer (will retry): failed to ensure load balancer for service istio-system/istio-ingressgateway: failed to ensure a static IP for load balancer (a72c616b7f5cf11e9b4694201ac10480(istio-system/istio-ingressgateway)): error getting static IP address: googleapi: Error 404: The resource 'projects/gc-lotto-stage/regions/europe-west2/addresses/a72c616b7f5cf11e9b4694201ac10480' was not found, notFound 
W 2019-11-11T16:01:25Z missing request for cpu 
W 2019-11-11T16:01:25Z missing request for cpu 
W 2019-11-11T16:01:26Z missing request for cpu 
I 2019-11-11T16:01:31Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
W 2019-11-11T16:01:35Z missing request for cpu 
W 2019-11-11T16:01:44Z 0/2 nodes are available: 2 Insufficient cpu. 
W 2019-11-11T16:01:44Z 0/2 nodes are available: 2 Insufficient cpu. 

它還取決於配置的節點大小:

首先查看節點可分配資源:

Kubectl describe node <node>
Allocatable:
  cpu:                4
  ephemeral-storage:  17784772Ki
  hugepages-2Mi:      0
  memory:             4034816Ki
  pods:               110

還要檢查已經分配的資源:

Allocated resources:
  Kubectl describe node <node>
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1505m (37%)   3 (75%)
  memory             2750Mi (69%)  6484Mi (164%)
  ephemeral-storage  0 (0%)        0 (0%)

然后查看資源請求:

如果 CPU 請求/內存請求超過節點可分配資源,則節點可能不會自動縮放。 節點有足夠的容量來處理 pod 請求。

理想情況下,可分配資源小於實際容量,因為系統會將容量的一部分分配給系統守護進程。

我有一段時間遇到同樣的問題,經過大量研究和跟蹤后發現,如果你想在 GKE 中實現集群自動擴展,你必須記住一些事情。

  1. 為每個可能的工作負載設置資源請求和限制

  2. 自動縮放可根據請求而不是限制工作。 因此,如果您的工作負載的所有請求的總和超過節點池中可用的總資源,那么您將看到它在擴展。

這對我有用。

希望能幫助到你。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM