简体   繁体   English

如何防止在 Kubernetes 中由 HPA 创建的特定时间段内缩小新扩展的 Pod?

[英]How to prevent scale down of newly scaled up pod for specific period of time which was created by HPA in Kubernetes?

I have a Kubernetes cluster set up in DigitalOcean.我在 DigitalOcean 中设置了一个 Kubernetes 集群。 The cluster is configured to auto-scale using HPA(Horizontal Pod Autoscaler).集群配置为使用 HPA(Horizontal Pod Autoscaler)自动扩展。 I want to prevent termination of a pod that got scaled up in the last 1 hour to avoid thrashing and saving the bill.我想防止终止在过去 1 小时内扩大规模的 pod,以避免颠簸和节省账单。 Following are the two reasons for the same:以下是相同的两个原因:

  1. Due to unpredictable traffic, sometimes new pods scale up and down multiple times in an hour.由于不可预测的流量,有时新的 Pod 在一个小时内会放大和缩小多次。 Because of the nature of the application, 50-60 new users need a new pod to handle the traffic.由于应用程序的性质,50-60 个新用户需要一个新的 Pod 来处理流量。
  2. DigitalOcean droplets are charged per hour. DigitalOcean 液滴按小时收费。 Even if the droplet was up for 15 minutes, They would charge it for an hour.即使液滴上升15分钟,他们也会充电一个小时。 So, sometimes we are paying for 5 droplets in an hour which could have been paid for just 1 droplet.因此,有时我们会在一小时内支付 5 滴,而这本来可以只支付 1 滴。

From the documentation , I could not find anything related to this.文档中,我找不到与此相关的任何内容。 Any hack for the same would be helpful.任何相同的黑客都会有所帮助。

Yes we can do this.是的,我们可以做到这一点。 I currently doing this experimentation almost related to your question.我目前正在做这个实验几乎与你的问题有关。

Try to find Following things while autoscaling.尝试在自动缩放时查找以下内容。

  1. Time taken for HPA to calculate Replica needed HPA 计算所需副本所需的时间
  2. Time taken for pod to Spin up.吊舱启动所需的时间。
  3. Time taken to Droplet spin up.液滴旋转所需的时间。
  4. Time taken for pods spin down.吊舱减速所需的时间。
  5. Time taken to Droplet Spin down.液滴旋转所需的时间。

Case 1: Time taken for HPA to calculate Replica needed (HPA)案例 1:HPA 计算所需副本所需的时间 (HPA)

HPA detect the changes, As soon as get metrics immediately or atleast within 15 secs. HPA 检测更改,立即或至少在 15 秒内获取指标。 Depends on horizontal-pod-autoscaler-sync-period By default it is set to 15 secs.取决于horizontal-pod-autoscaler-sync-period默认设置为 15 秒。 As soon HPA get Metric, it calculates Replica Needed.一旦 HPA 获得 Metric,它就会计算 Replica Needed。

Case 2: Time taken for pod to Spin up.案例 2:吊舱启动所需的时间。 (HPA) (HPA)

As soon as HPA calculate Desired Replicas, Pods start spin up.一旦 HPA 计算出所需的副本,Pod 就会开始启动。 But it depends on ScaleUp Policy .但这取决于ScaleUp Policy You can set this as per your use case.And also depend on Droplet available, cluster autoscaler您可以根据您的用例进行设置。还取决于可用的 Droplet,集群自动缩放器

For Example: You can tell HPA, Hey, please spin up 4 pods in 15 secs OR Spin up 100 % of current available pods in 20 secs.例如:您可以告诉 HPA,嘿,请在 15 秒内启动 4 个吊舱或在 20 秒内启动 100% 的当前可用吊舱。

Now HPA, will take decision to select anyone policy, which make more impact(Most changes in replica count).现在 HPA 将决定 select 任何人的政策,这会产生更大的影响(副本数的大多数变化)。 If 100% pods > 4 pods ,Second policy takeover, otherwise first Policy can take over.如果100% pods > 4 pods ,第二个策略接管,否则第一个策略可以接管。 Process repeats until reach the desried replica.过程重复,直到到达所需的副本。

If you need scaled up Pod count immediately, you set policy as spin up 100 % pods in 1 secs, hence it try to spin up 100 % of current replica count in every secs until match the Desired Replica count.如果您需要立即增加 Pod 数量,您可以将策略设置为在 1 秒内启动 100% 的 Pod,因此它会尝试在每秒钟内启动 100% 的当前副本数,直到匹配所需的副本数。

Case 3: Time taken to Droplet spin up.案例 3:Droplet 旋转所需的时间。 (Cluster Autoscaler) (集群自动缩放器)

Time Taken For:所用时间:

  • Cluster autoscaler to detect pending pods and start spinning droplet: 1 min 05 secs (approx)用于检测待处理 pod 并开始旋转液滴的集群自动缩放器: 1 min 05 secs (大约)
  • Droplet spin up, but Not Ready State: 1 min 20 secs液滴旋转,但未准备好 State: 1 min 20 secs
  • Droplet to each READY STATE: 10 - 20 secs滴到每个 READY STATE: 10 - 20 secs

Total Time taken to droplet Available: 2 min 40 secs (approx)

Case 4: Time taken for pod to spin down.案例 4:吊舱减速所需的时间。 (HPA) (HPA)

It depends on ScalDown Policy, as like as Case 2.它取决于 ScalDown Policy,就像案例 2 一样。

Case 5: Time taken to Droplet Spin down.案例 5:液滴旋转所需的时间。 (Cluster Autoscaler) (集群自动缩放器)

After all the Target pods terminated from the Droplet(Time taken depends on case 4).在所有目标 pod 从 Droplet 终止之后(所用时间取决于案例 4)。

Digital Ocean set Taints to node like DeletionCandidate...=<timestamp>:NopreferSchedule Digital Ocean 将 Taints 设置为像DeletionCandidate...=<timestamp>:NopreferSchedule这样的节点

Ten mins from taint set, droplet starts spin down.污点设置十分钟后,液滴开始旋转。

Conclusion:结论:

If you need Node for one hour to stay alive (utilize as max because of hourly charge) And Not cross one hour(if above 1 hr, it billed as 2 hr)如果您需要节点一小时才能保持活力(由于按小时收费,最多使用)并且不超过一小时(如果超过 1 小时,则按 2 小时计费)

You can set, StabilizatioWindowSeconds = 1 hr - DigitalOcean Time Interval to delete您可以设置,StabilizatioWindowSeconds = 1 hr - 要删除的 DigitalOcean 时间间隔

Theoretically, StabilizatioWindowSeconds = 1 hr - 10 mins = 50 mins (3000 secs)理论上, StabilizatioWindowSeconds = 1 hr - 10 mins = 50 mins (3000 secs)

Practically Time taken for all Pods to terminate may vary depend on the scale down policy, your application etc...实际上,所有 Pod 终止所用的时间可能会因缩减策略、您的应用程序等而异......

So I set approx(according to my case) StabilizatioWindowSeconds = 1 hr - 20 mins = 40 mins (2400 secs)所以我设置了大约(根据我的情况) StabilizatioWindowSeconds = 1 hr - 20 mins = 40 mins (2400 secs)

Thus, your Scaled up pods can now alive for 40 mins, And starts terminating after 40 mins (In my case all pods terminated within max of 5 mins).因此,您的 Scaled up pod 现在可以存活 40 分钟,并在 40 分钟后开始终止(在我的情况下,所有 pod 最多在 5 分钟内终止)。 So balance 15 mins for digital ocean to destroy the droplet.所以平衡15分钟让数字海洋摧毁水滴。

CAUTION: Time calculated are depending on my use case and environment etc..注意:计算的时间取决于我的用例和环境等。

Add HPA behavior config for reference添加 HPA 行为配置以供参考

behavior:
    scaleDown:
      stabilizationWindowSeconds: 2400
      selectPolicy: Max
      policies:
      - type: percent
        value: 100
        periodSeconds: 15
    scaleUp:
      stabilizationWindowSeconds: 0
      selectPolicy: Max
      policies:
      - type: Percent
        value: 100
        periodSeconds: 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM