简体   繁体   中英

Scheduling and scaling pods in kubernetes

i am running k8s cluster on GKE

it has 4 node pool with different configuration

Node pool : 1 (Single node coroned status)

Running Redis & RabbitMQ

Node pool : 2 (Single node coroned status)

Running Monitoring & Prometheus

Node pool : 3 (Big large single node)

Application pods

Node pool : 4 (Single node with auto-scaling enabled)

Application pods

currently, i am running single replicas for each service on GKE

however 3 replicas of the main service which mostly manages everything.

when scaling this main service with HPA sometime seen the issue of Node getting crashed or kubelet frequent restart PODs goes to Unkown state.

How to handle this scenario ? If the node gets crashed GKE taking time to auto repair and which cause service down time.

Question : 2

Node pool : 3 -4 running application PODs. Inside the application, there are 3-4 memory-intensive micro services i am also thinking same to use Node selector and fix it on one Node.

while only small node pool will run main service which has HPA and node auto scaling auto work for that node pool.

however i feel like it's not best way to it with Node selector.

it's always best to run more than one replicas of each service but currently, we are running single replicas only of each service so please suggest considering that part.

As Patrick W rightly suggested in his comment:

if you have a single node, you leave yourself with a single point of failure . Also keep in mind that autoscaling takes time to kick in and is based on resource requests . If your node suffers OOM because of memory intensive workloads, you need to readjust your memory requests and limits – Patrick W Oct 10 at

you may need to redesign a bit your infrastructure so you have more than a single node in every nodepool as well as readjust mamory requests and limits

You may want to take a look at the following sections in the official kubernetes docs and Google Cloud blog :

How to handle this scenario ? If the node gets crashed GKE taking time to auto repair and which cause service down time.

That's why having more than just one node for a single node pool can be much better option. It greatly reduces the likelihood that you'll end up in the situation described above. GKE autorapair feature needs to take its time (usually a few minutes) and if this is your only node, you cannot do much about it and need to accept possible downtimes.

Node pool : 3 -4 running application PODs. Inside the application, there are 3-4 memory-intensive micro services i am also thinking same to use Node selector and fix it on one Node.

while only small node pool will run main service which has HPA and node auto scaling auto work for that node pool.

however i feel like it's not best way to it with Node selector.

You may also take a loot at node affinity and anti-affinity as well as taints and tolerations

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM