简体繁体 English

在 k8s 集群中添加新节点动态扩展 Pod

[英]dynamically scale pods with new nodes addition in k8s cluster

原文 2020-10-20 07:43:56 6 1 kubernetes/ affinity/ kubernetes-statefulset/ openebs

I am building a application in k8s where I want the replicas of deployments/statefulsets to scale as per the number of nodes added.我正在 k8s 中构建一个应用程序，我希望部署/状态集的副本根据添加的节点数量进行扩展。

Initially deployment should come up with 1 replicas when 1st node is created and grow as we add more worker/master nodes to it and once max is achieved, it should stop growing.最初的部署应该在创建第一个节点时提供 1 个副本，并且随着我们向其添加更多的工作节点/主节点而增长，一旦达到最大值，它应该停止增长。 I am using local storage and I dont want statefulsets getting scheduled in a single node.我正在使用本地存储，我不希望在单个节点中安排有状态集。

Assume I have a deployment where I expect 2 repicas to run.假设我有一个部署，我希望在其中运行 2 个副本。 Only one should come when 1st node is launched.当第一个节点启动时，应该只有一个。 Finally when I have a 3 node master, It should have 2 replicas running in 2 nodes.最后，当我有一个 3 节点主节点时，它应该有 2 个副本在 2 个节点上运行。

Is there anyway, I can achieve this.无论如何，我可以做到这一点。 TIA TIA

1 个解决方案

There are various options.有多种选择。

DaemonSet守护进程集

If you want exactly one replica of your app on every worker node, you can use a DaemonSet (although I guess you want to have only up to certain number of replicas, so in this case, this isn't a solution for your use case).如果您只想在每个工作节点上有一个应用程序副本，您可以使用DaemonSet （尽管我猜您只想拥有特定数量的副本，因此在这种情况下，这不是您的用例的解决方案）。

Pod anti-affinty Pod 反亲和性

You can define a Pod anti-affinity for the Pods of your Deployment with a requiredDuringSchedulingIgnoredDuringExecution type and a topologyKey referring to a label that's different on each node.您可以使用requiredDuringSchedulingIgnoredDuringExecution类型和一个topologyKey为您的Deployment 的Pod定义一个Pod 反亲和性，该topologyKey指的是每个节点上不同的标签。 In this way, no two Pods of your Deployment will be scheduled to the same node.这样，您的 Deployment 不会有两个 Pod 被调度到同一个节点。

For example, if you define three replicas in your Deployment, and there are only two worker nodes available, then two replicas will be scheduled on these two worker nodes and the third replica will remain pending until a third worker node is created, in which case it will be scheduled to this node.例如，如果您在 Deployment 中定义了三个副本，并且只有两个工作节点可用，那么将在这两个工作节点上调度两个副本，而第三个副本将保持挂起状态，直到创建第三个工作节点，在这种情况下它将被调度到这个节点。

Operator操作员

The most flexible solution is creating anoperator .最灵活的解决方案是创建运算符。 In this case, you create a new custom resource which encodes your desired deployment behaviour (eg the desired maximum number of replicas).在这种情况下，您创建一个新的自定义资源，该资源对您所需的部署行为（例如所需的最大副本数）进行编码。 You do this by defining a custom resource definition (CRD).您可以通过定义自定义资源定义 (CRD) 来完成此操作。 You then create an operator which is an application that interacts with the Kubernetes API and enforces this behaviour.然后创建一个操作符，它是一个与 Kubernetes API 交互并强制执行此行为的应用程序。

At runtime, this may then look as follows:在运行时，这可能如下所示：

You create an instance of your custom resource → the operator becomes active, checks the declared number of replicas in the custom resource, checks the number of available worker nodes, and creates the appropriate number of replicas.您创建自定义资源的实例 → 操作员变为活动状态，检查自定义资源中声明的副本数量，检查可用工作节点的数量，并创建适当数量的副本。
You add an additional node to the cluster → the operator becomes active, checks if there are any pending replicas in the instances of your custom resource, and if so, schedules one of them to the new node.您向集群添加一个额外的节点 → 操作员变为活动状态，检查您的自定义资源的实例中是否有任何挂起的副本，如果有，将其中一个调度到新节点。
You remove a node from the cluster → the operator becomes active and makes sure the replicas on the removed node are not scheduled to another node but just become pending until a new node is created.您从集群中删除一个节点 → 操作员变为活动状态并确保被删除节点上的副本不会被调度到另一个节点，而是在创建新节点之前一直处于挂起状态。

You can extend this logic in any way you want, since you can implement any logic you want in an operator.您可以以任何您想要的方式扩展此逻辑，因为您可以在运算符中实现您想要的任何逻辑。