[英]Kubernetes Pod Horizontal Autoscaling safe drain, celery worker scales down mid-work
I have a Kubernetes cluster on GKE.我在 GKE 上有一个 Kubernetes 集群。 Among others, my current layout has a Pod (worker-pod) configured with an Horizontal pod autoscaler, which scales on an external metric provided on Stackdriver by BlueMedora's BindPlane.
其中,我当前的布局有一个 Pod(worker-pod),它配置了一个 Horizontal Pod 自动缩放器,它根据 BlueMedora 的 BindPlane 在 Stackdriver 上提供的外部指标进行缩放。
The autoscaling works perfectly, but sometimes when it's time to scale down, the pods get drained while doing a task that never gets finished.自动缩放工作得很好,但有时当需要缩小时,pod 会在执行永远不会完成的任务时耗尽电量。
The pod is running a Celery worker, while the job queues are managed by another Pod with RabbitMQ, I'm not sure wheter to fix this on the K8s side or rabbitMQ side. pod 正在运行一个 Celery 工作者,而作业队列由另一个 Pod 和 RabbitMQ 管理,我不确定是在 K8s 端还是在 rabbitMQ 端修复这个问题。
My pod specification (simplified):我的 pod 规范(简化):
apiVersion: apps/v1
kind: Deployment
metadata:
name: pod-worker
labels:
component: worker
spec:
selector:
matchLabels:
app: pod-worker
replicas: 1
template:
metadata:
labels:
app: pod-worker
component: worker
spec:
containers:
- name: worker
image: custom-image:latest
imagePullPolicy: Always
command: ['celery']
args: ['worker','-A','celery_tasks.task','-l','info', '-Q', 'default,priority','-c','1', '-Ofair']
resources:
limits:
cpu: 500m
requests:
cpu: 150m
memory: 200Mi
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
restartPolicy: Always
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: pod-worker
labels:
component: worker
spec:
maxReplicas: 30
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: pod-worker
metrics:
- external:
metricName: external.googleapis.com|bluemedora|generic_node|rabbitmq|cluster|messages
targetAverageValue: "40"
type: External
To fix this you have multiple approaches, first, to avoid losing messages to process you need to use RabbitMQ manual ACKs, which you need to ACK after the work is succesfull, if it fails then the task will be requeued and then reprocessed.要解决此问题,您有多种方法,首先,为了避免丢失要处理的消息,您需要使用 RabbitMQ 手动 ACK,您需要在工作成功后进行 ACK,如果失败,则任务将重新排队然后重新处理。
Second, essentially, when the autoscaling (downscaling) starts it will be sent a SIGTERM signal and wait until the variable (in podSpec):其次,本质上,当自动缩放(缩减)开始时,它将被发送一个 SIGTERM 信号并等待变量(在 podSpec 中):
terminationGracePeriodSeconds: 90
So you can tinker with that variable and se it a little high so it would be able to gracefully shutdown after the task is done.因此,您可以修改该变量并将其设置得高一点,以便在任务完成后能够正常关闭。
After the terminationGracePeriodSeconds time has passed, the pod will receive a SIGKILL signal, which will kill the pod.在终止GracePeriodSeconds 时间过后,pod 将收到一个 SIGKILL 信号,这将杀死 pod。
Also, you can handle these signals with python, here is a small example:此外,您可以使用 python 处理这些信号,这是一个小例子:
import signal
import time
class GracefulKiller:
kill_now = False
def __init__(self):
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self,signum, frame):
self.kill_now = True
if __name__ == '__main__':
killer = GracefulKiller()
while not killer.kill_now:
time.sleep(1)
print("doing something in a loop ...")
print "End of the program. I was killed gracefully :)"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.