简体   繁体   中英

Kubernetes Operator in Airflow is not sharing the load across nodes. Why?

I have airflow 1.10.5 on a Kube.netes cluster.

The DAGs are written with Kube.netes operator so that they can spin pods for each task inside the DAG on execution, on the k8s cluster.

I have 10 worker nodes.

The pods created by airflow are being created on the same node, where airflow is running. When many pods have to spin up, they all are queued on the same node, which makes many pod failures due to lack of resources on the node.

At the same time, all other 9 nodes are being used very less, as we have huge load only for the airflow jobs.

How to make the airflow to use all the worker nodes of the k8s cluster?

I do not use any of node affinity or node selector.

Solved this little 'issue' by attaching affinity to workers pods manually in Helm chart suggested by the document: Airflow Helm-Chart .

workers:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:
            labelSelector:
              matchLabels:
                component: worker
            topologyKey: kubernetes.io/hostname
          weight: 100

The Airflow Helm Chart values.yaml defines this affinity and says that it will be the default for worker.

  affinity: {}
  # default worker affinity is:
  #  podAntiAffinity:
  #    preferredDuringSchedulingIgnoredDuringExecution:
  #    - podAffinityTerm:
  #        labelSelector:
  #          matchLabels:
  #            component: worker
  #        topologyKey: kubernetes.io/hostname
  #      weight: 100

But it fails to mention that this doesn't apply to worker pod but only to worker deployment under CeleryExecutor or CeleryKube.netesExecutor in worker-deployment.yaml .

...
################################
## Airflow Worker Deployment
#################################
{{- $persistence := .Values.workers.persistence.enabled }}
{{- if or (eq .Values.executor "CeleryExecutor") (eq .Values.executor "CeleryKubernetesExecutor") }}
...

So if your do want do spread out your worker pods more, you need to add this affinity(or other custom affinity) to your worker pod template, which can be done through Helm values.yaml.

Though i don't think that this will be considered as a 'issue' as most likely the certain node is free enough so Kube.netes keeps scheduling workers pods to it. When system load goes high, Kube.netes will spread out workers pods. And having worker pods in same node might reduce the.network traffic between nodes in some cases.

But in my case when all worker pods are being scheduled to the same nodes, the pods initialization latency is higher than having a distributed workload. So i decide to spread them out across the cluster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM