简体   繁体   English

Azure Kubernetes 服务:如何将 Pod 从 Spot 节点池自动移动到常规节点池?

[英]Azure Kubernetes Service: How to move pod from Spot node pool to Regular node pool automatically?

I have 2 node pools in my azure kubernetes cluster.我的 azure kubernetes 集群中有 2 个节点池。 where one node pool is Spot VM node pool and another is a regular VM node pool.其中一个节点池是 Spot VM 节点池,另一个是常规 VM 节点池。 I have deployed 2 pods on the spot node pool.我在现场节点池上部署了 2 个 Pod。 So, I want that if the spot node pool is get evicted then the pods on the same are to be rescheduled on regular node pool automatically?所以,我希望如果现场节点池被驱逐,那么同一节点上的 pod 会自动重新安排在常规节点池上吗?

I have learnt about node affinity and node selector which is used to run the pods in certain nodes.我已经了解了用于在某些节点中运行 pod 的节点亲和性和节点选择器。 It would be helpful if kubernetes offer this feature to migrate the pods to another node automatically if the spot node pool/spot instances are get evicted.如果 kubernetes 提供此功能以在 Spot 节点池/Spot 实例被驱逐时自动将 Pod 迁移到另一个节点,这将很有帮助。

Can any one know how can we achieve this in kubernetes?谁能知道我们如何在 kubernetes 中实现这一目标?

Thanks.谢谢。

Used Kubernetes version 1.18.14二手 Kubernetes 版本 1.18.14

You could use a NoSchedule taint on the spot nodes.您可以在现场节点上使用 NoSchedule 污点。 That will not evict any running pods from the node, but it will not schedule any new pods to them (unless you specifically specify a matching toleration).这不会从节点驱逐任何正在运行的 pod,但不会为它们安排任何新的 pod(除非您特别指定了匹配的容忍度)。

You can have a look at this documentation page for more details: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/您可以查看此文档页面以获取更多详细信息: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/

But in general, what you do is the following:但总的来说,您要做的是:

There should be a unique label(s) on the spot nodes, you can find it if you describe one of the nodes.现场节点上应该有一个唯一的标签,如果您描述其中一个节点,您可以找到它。 Use it to taint all the nodes with a NoSchedule taint like so使用它来污染所有具有 NoSchedule 污染的节点,如下所示

kubectl taint nodes node1 key1=value1:NoSchedule 

(replace the key1=value1 with the label you found) (把key1=value1换成你找到的label)

For all pods you want to keep scheduling to the spot nodes (such as system pods) add the following toleration to allow them to continue:对于您希望继续调度到现场节点的所有 pod(例如系统 pod),添加以下容忍度以允许它们继续:

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule"

For the pods you mentioned in your question, you don't want to reschedule them back to a spot node once it goes down, so just don't add the tolerations on them.对于您在问题中提到的 Pod,您不想在它出现故障后将它们重新安排回 Spot 节点,因此不要在它们上添加容忍度。

Assuming that your pods are controlled by a deployment or a stateful set (or any other controller that takes care to reschedule when it loses a pod), when your pods will get evicted from a spot node due it going away, the pods that will replace it will not be able to be scheduled to the spot nodes anymore, and given that the only other option is the regular nodes, they will be scheduled there.假设您的 pod 由部署或有状态集(或任何其他 controller 在丢失 pod 时注意重新安排)控制,当您的 pod 将被从一个点节点驱逐出它时,将替换的 pod它将无法再被安排到现场节点,并且鉴于唯一的其他选择是常规节点,它们将被安排在那里。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM