简体   繁体   English

Kubernetes Pod 水平自动缩放安全排水,芹菜工人在工作中缩小规模

[英]Kubernetes Pod Horizontal Autoscaling safe drain, celery worker scales down mid-work

I have a Kubernetes cluster on GKE.我在 GKE 上有一个 Kubernetes 集群。 Among others, my current layout has a Pod (worker-pod) configured with an Horizontal pod autoscaler, which scales on an external metric provided on Stackdriver by BlueMedora's BindPlane.其中,我当前的布局有一个 Pod(worker-pod),它配置了一个 Horizo​​ntal Pod 自动缩放器,它根据 BlueMedora 的 BindPlane 在 Stackdriver 上提供的外部指标进行缩放。

The autoscaling works perfectly, but sometimes when it's time to scale down, the pods get drained while doing a task that never gets finished.自动缩放工作得很好,但有时当需要缩小时,pod 会在执行永远不会完成的任务时耗尽电量。

The pod is running a Celery worker, while the job queues are managed by another Pod with RabbitMQ, I'm not sure wheter to fix this on the K8s side or rabbitMQ side. pod 正在运行一个 Celery 工作者,而作业队列由另一个 Pod 和 RabbitMQ 管理,我不确定是在 K8s 端还是在 rabbitMQ 端修复这个问题。

How can I avoid the HPA to downsize a pod while he's doing a task?我怎样才能避免 HPA 在他执行任务时缩小吊舱的尺寸?

My pod specification (simplified):我的 pod 规范(简化):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pod-worker
  labels:
    component: worker
spec:
  selector:
    matchLabels:
      app: pod-worker
  replicas: 1
  template:
    metadata:
      labels:
        app: pod-worker
        component: worker
    spec:
      containers:
      - name: worker
        image: custom-image:latest
        imagePullPolicy: Always
        command: ['celery']
        args: ['worker','-A','celery_tasks.task','-l','info', '-Q', 'default,priority','-c','1', '-Ofair']
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 150m
            memory: 200Mi
        env:
         - name: POD_NAME
           valueFrom:
             fieldRef:
               fieldPath: metadata.name
      restartPolicy: Always
    
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: pod-worker
  labels:
    component: worker
spec:
  maxReplicas: 30
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: pod-worker
  metrics:
    - external:
        metricName: external.googleapis.com|bluemedora|generic_node|rabbitmq|cluster|messages
        targetAverageValue: "40"
      type: External

To fix this you have multiple approaches, first, to avoid losing messages to process you need to use RabbitMQ manual ACKs, which you need to ACK after the work is succesfull, if it fails then the task will be requeued and then reprocessed.要解决此问题,您有多种方法,首先,为了避免丢失要处理的消息,您需要使用 RabbitMQ 手动 ACK,您需要在工作成功后进行 ACK,如果失败,则任务将重新排队然后重新处理。

Second, essentially, when the autoscaling (downscaling) starts it will be sent a SIGTERM signal and wait until the variable (in podSpec):其次,本质上,当自动缩放(缩减)开始时,它将被发送一个 SIGTERM 信号并等待变量(在 podSpec 中):

terminationGracePeriodSeconds: 90

So you can tinker with that variable and se it a little high so it would be able to gracefully shutdown after the task is done.因此,您可以修改该变量并将其设置得高一点,以便在任务完成后能够正常关闭。

After the terminationGracePeriodSeconds time has passed, the pod will receive a SIGKILL signal, which will kill the pod.在终止GracePeriodSeconds 时间过后,pod 将收到一个 SIGKILL 信号,这将杀死 pod。

Also, you can handle these signals with python, here is a small example:此外,您可以使用 python 处理这些信号,这是一个小例子:

import signal
import time
class GracefulKiller:
  kill_now = False
  def __init__(self):
    signal.signal(signal.SIGINT, self.exit_gracefully)
    signal.signal(signal.SIGTERM, self.exit_gracefully)
  def exit_gracefully(self,signum, frame):
    self.kill_now = True
if __name__ == '__main__':
  killer = GracefulKiller()
  while not killer.kill_now:
    time.sleep(1)
    print("doing something in a loop ...")
  print "End of the program. I was killed gracefully :)"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM