在 Airflow 中使用 jinja2 为 KubernetesPodOperator 构建列表

Question

we have an application running in a pod I want to trigger with airflow.我们有一个应用程序在一个 pod 中运行，我想用气流触发。 The application runs with a lot of entities and take a lot of time.该应用程序与大量实体一起运行并花费大量时间。 The nature of our setup is that some of these might fail, and we want to be able to re-run using only one or a few of the enteties:我们设置的本质是其中一些可能会失败，我们希望能够仅使用一个或几个实体重新运行：

my_program # Run full application

my_program -e entity1 -e entity2 # Run application limited to entity1 and entety2.

My plan was to allow users to trigger the DAG again with a list of entities using the "Trigger with config" in the Airflow UI and have that limit the DAG using the {{ dag_run.conf }} options.我的计划是允许用户使用 Airflow UI 中的“使用配置触发”的实体列表再次触发 DAG，并使用{{ dag_run.conf }}选项限制 DAG。

The problem I now face is that the KubernetesPodOperator expects a list of strings, and I do not understand how use jinja to construct a list where I before do not know its length.我现在面临的问题是KubernetesPodOperator需要一个字符串列表，我不明白如何使用 jinja 来构造一个我以前不知道其长度的列表。

This is what I tried, but then of course the jinja will not be templated.这是我尝试过的，但是当然 jinja 不会被模板化。 I understand how I can insert templated strings into the list, but now how I can do it when I do not know then length of the list in advance.我了解如何将模板化字符串插入列表中，但是现在当我事先不知道列表的长度时我该怎么做。

with DAG(
    "my_dag",
    description="Run my dag",
    schedule_interval="@daily",
    start_date=datetime.datetime(2021, 10, 14),
    default_args=default_args,
) as dag:

    entities = """{%- for entity in dag_run.conf['entities'] -%} -p {{ entity }} {% endfor %}"""
    arguments = list(filter(None, ['my_program', *entities.split(' ')]))

    t1 = KubernetesPodOperator(
        task_id="my_task_id",
        image="url_to_docker_image:latest",
        name="my_task_name",
        arguments=arguments,
        is_delete_operator_pod=True,
        env_vars={"AIRFLOW_RUN_ID": "{{ run_id }}"},
    )

Edit: Here is my second attempt using jinja and render_template_as_native_obj=True,编辑：这是我第二次尝试使用 jinja 和render_template_as_native_obj=True,

with DAG(
    "my_dag",
    description="Run my dag",
    schedule_interval="@daily",
    start_date=datetime.datetime(2021, 10, 14),
    default_args=default_args,
    render_template_as_native_obj=True,
) as dag:


    arguments = """['my_program', {% if entities is defined %}
      {%- for entity in entities-%} '-p', '{{ entity }}', {% endfor %}
      {%- endif %}]
    """

    t1 = KubernetesPodOperator(
        task_id="my_task_id",
        image="url_to_docker_image:latest",
        name="my_task_name",
        arguments=arguments, # type: ignore
        is_delete_operator_pod=True,
        env_vars={"AIRFLOW_RUN_ID": "{{ run_id }}"},
    )

But this seems to not be converted to a list properly:但这似乎没有正确转换为列表：

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod in version \\"v1\\" cannot be handled as a Pod: v1.Pod.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.Args: []string: decode slice: expect [ or n, but found \\", error found in #10 byte of ...|{\\"args\\": \\"['my_program|...

Answer 1

The second approach worked with minor tweak.第二种方法经过微小的调整。 Of course the variable were not available in the example (which had been stripped down) and parameters were fetched from dag_run.conf['entities'] instead of just entities .当然，该变量在示例中不可用（已被剥离），并且参数是从dag_run.conf['entities']而不是仅获取entities 。

The second problem was in what was valid input for jinja to convert to a python object, and I had to remove empty space at the end of the string as well as removing new line characters:第二个问题是什么是 jinja 转换为 python 对象的有效输入，我必须删除字符串末尾的空格以及删除换行符：

arguments = """['my_program', {% if dag_run.conf['entities'] is defined %}
  {%- for entity in dag_run.conf['entities']-%} '-p', '{{ entity }}', {% endfor %}
  {%- endif %}]
""".replace('\n','').strip()

Answer 2

You are in the right track with your second attempt, but the template in your arguments variable has an extra comma (',') at the end of the last entity.您在第二次尝试中处于正确的轨道，但参数变量中的模板在最后一个实体的末尾有一个额外的逗号 (',')。

import jinja2
from jinja2.nativetypes import NativeEnvironment

env = NativeEnvironment()
template = env.from_string(arguments)
print (template.render(entities=range(5)) )

Outputs: ['my_program', '-p', '0', '-p', '1', '-p', '2', '-p', '3', '-p', '4', ]输出： ['my_program', '-p', '0', '-p', '1', '-p', '2', '-p', '3', '-p', '4', ]

If you change your arguments variable to this:如果您将参数变量更改为：

arguments = """
   ['my_program' {% if entities is defined %}
   {%- for entity in entities-%}, '-p', '{{ entity }}' {% endfor %}
   {%- endif %}]
   """

The output is now a string that can Jinja can convert to a python array: ['my_program' , '-p', '0' , '-p', '1' , '-p', '2' , '-p', '3' , '-p', '4' ]输出现在是 Jinja 可以转换为 python 数组的字符串： ['my_program' , '-p', '0' , '-p', '1' , '-p', '2' , '-p', '3' , '-p', '4' ]

在 Airflow 中使用 jinja2 为 KubernetesPodOperator 构建列表

问题描述

2 个解决方案

解决方案1
0 2021-10-19 11:39:07

解决方案2
0 2021-10-19 14:52:18

在 Airflow 中使用 jinja2 为 KubernetesPodOperator 构建列表

问题描述

2 个解决方案

解决方案1 0 2021-10-19 11:39:07

解决方案2 0 2021-10-19 14:52:18

解决方案1
0 2021-10-19 11:39:07

解决方案2
0 2021-10-19 14:52:18