简体繁体 English

在 kubernetes 中调度和扩展 Pod

[英]Scheduling and scaling pods in kubernetes

原文 2020-10-09 20:45:59 4 1 docker/ kubernetes/ google-cloud-platform/ google-kubernetes-engine

i am running k8s cluster on GKE我在 GKE 上运行 k8s 集群

it has 4 node pool with different configuration它有 4 个不同配置的节点池

Node pool : 1 (Single node coroned status)节点池：1 （单节点加冕状态）

Running Redis & RabbitMQ运行Redis 和 RabbitMQ

Node pool : 2 (Single node coroned status)节点池：2 （单节点加冕状态）

Running Monitoring & Prometheus运行监控和 Prometheus

Node pool : 3 (Big large single node)节点池：3 （大单节点）

Application pods应用程序舱

Node pool : 4 (Single node with auto-scaling enabled)节点池：4 （启用自动缩放的单节点）

Application pods应用程序舱

currently, i am running single replicas for each service on GKE目前，我正在 GKE 上为每个服务运行单个副本

however 3 replicas of the main service which mostly manages everything.然而，主要管理一切的主要服务的 3 个副本。

when scaling this main service with HPA sometime seen the issue of Node getting crashed or kubelet frequent restart PODs goes to Unkown state.当使用 HPA 扩展这个主要服务时，有时会看到节点崩溃或kubelet frequent restart POD 进入未知状态的问题。

How to handle this scenario ?如何处理这种情况？ If the node gets crashed GKE taking time to auto repair and which cause service down time.如果节点崩溃，GKE 需要时间进行自动修复并导致服务停机。

Question : 2问题2

Node pool : 3 -4 running application PODs.节点池：3 -4 个正在运行的应用程序 POD。 Inside the application, there are 3-4 memory-intensive micro services i am also thinking same to use Node selector and fix it on one Node.在应用程序内部，有 3-4 个内存密集型微服务，我也在考虑使用Node 选择器并将其修复在一个 Node 上。

while only small node pool will run main service which has HPA and node auto scaling auto work for that node pool.而只有小节点池将运行主服务，该服务具有该节点池的 HPA 和节点自动缩放自动工作。

however i feel like it's not best way to it with Node selector.但是我觉得这不是使用 Node 选择器的最佳方式。

it's always best to run more than one replicas of each service but currently, we are running single replicas only of each service so please suggest considering that part.最好为每个服务运行多个副本，但目前，我们只运行每个服务的单个副本，因此请建议考虑该部分。

1 个解决方案

As Patrick W rightly suggested in his comment:正如Patrick W在他的评论中正确建议的那样：

if you have a single node, you leave yourself with a single point of failure .如果您只有一个节点，就会出现单点故障。 Also keep in mind that autoscaling takes time to kick in and is based on resource requests .还要记住，自动缩放需要时间才能启动并且基于资源请求。 If your node suffers OOM because of memory intensive workloads, you need to readjust your memory requests and limits – Patrick W Oct 10 at如果您的节点因内存密集型工作负载而遭受 OOM，您需要重新调整您的内存请求和限制– Patrick W Oct 10 at

you may need to redesign a bit your infrastructure so you have more than a single node in every nodepool as well as readjust mamory requests and limits您可能需要重新设计您的基础架构，以便在每个节点池中拥有多个节点，并重新调整内存请求和限制

You may want to take a look at the following sections in the official kubernetes docs and Google Cloud blog :您可能需要查看官方 kubernetes 文档和Google Cloud 博客中的以下部分：

How to handle this scenario ?如何处理这种情况？ If the node gets crashed GKE taking time to auto repair and which cause service down time.如果节点崩溃，GKE 需要时间进行自动修复并导致服务停机。

That's why having more than just one node for a single node pool can be much better option.这就是为什么为单个节点池拥有多个节点可能是更好的选择。 It greatly reduces the likelihood that you'll end up in the situation described above.它大大降低了您最终陷入上述情况的可能性。 GKE autorapair feature needs to take its time (usually a few minutes) and if this is your only node, you cannot do much about it and need to accept possible downtimes. GKE 自动修复功能需要花时间（通常需要几分钟），如果这是您唯一的节点，您将无能为力，需要接受可能的停机时间。

Node pool : 3 -4 running application PODs.节点池：3 -4 个正在运行的应用程序 POD。 Inside the application, there are 3-4 memory-intensive micro services i am also thinking same to use Node selector and fix it on one Node.在应用程序内部，有 3-4 个内存密集型微服务，我也在考虑使用 Node 选择器并将其修复在一个 Node 上。

while only small node pool will run main service which has HPA and node auto scaling auto work for that node pool.而只有小节点池将运行主服务，该服务具有该节点池的 HPA 和节点自动缩放自动工作。

however i feel like it's not best way to it with Node selector.但是我觉得这不是使用 Node 选择器的最佳方式。

You may also take a loot at node affinity and anti-affinity as well as taints and tolerations您还可以在节点亲和性和反亲和性以及污点和容忍度方面进行掠夺