简体   繁体   中英

Kubernetes HPA and Scaling Down

I have a kubernetes HPA set up in my cluster, and it works as expected scaling up and down instances of pods as the cpu/memory increases and decreases.

The only thing is that my pods handle web requests, so it occasionally scales down a pod that's in the process of handling a web request. The web server never gets a response back from the pod that was scaled down and thus the caller of the web api gets an error back.

This all makes sense theoretically. My question is does anyone know of a best practice way to handle this? Is there some way I can wait until all requests are processed before scaling down? Or some other way to ensure that requests complete before HPA scales down the pod?

I can think of a few solutions, none of which I like:

  1. Add retry mechanism to the caller and just leave the cluster as is.
  2. Don't use HPA for web request pods (seems like it defeats the purpose).
  3. Try to create some sort of custom metric and see if I can get that metric into Kubernetes (ex https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics )

Any suggestions would be appreciated. Thanks in advance!

Graceful shutdown of pods

You must design your apps to support graceful shutdown . First your pod will receive a SIGTERM signal and after 30 seconds (can be configured) your pod will receive a SIGKILL signal and be removed. See Termination of pods

SIGTERM : When your app receives termination signal, your pod will not receive new requests but you should try to fulfill responses of already received requests.

Design for idempotency

Your apps should also be designed for idempotency so you can safely retry failed requests.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM