简体   繁体   English

是否可以更新/覆盖 Airflow ['dag_run'].conf?

[英]Is it possible to update/overwrite the Airflow [‘dag_run’].conf?

We typically start Airflow DAGs with the trigger_dag CLI command.我们通常使用trigger_dag CLI 命令启动 Airflow DAG。 For example:例如:

airflow trigger_dag my_dag --conf '{"field1": 1, "field2": 2}'

We access this conf in our operators using context['dag_run'].conf我们使用context['dag_run'].conf在我们的操作员中访问这个context['dag_run'].conf

Sometimes when the DAG breaks at some task, we'd like to "update" the conf and restart the broken task (and downstream dependencies) with this new conf.有时,当 DAG 在某些任务上中断时,我们想“更新”conf 并使用这个新的 conf 重新启动中断的任务(和下游依赖项)。 For example:例如:

new conf --> {"field1": 3, "field2": 4}新配置 --> {"field1": 3, "field2": 4}

Is it possible to “update” the dag_run conf with a new json string like this?是否可以使用这样的新 json 字符串“更新”dag_run conf?

Would be interested in hearing thoughts on this, other solutions, or potentially ways to avoid this situation to begin with.有兴趣听取对此的想法、其他解决方案或避免这种情况的潜在方法。

Working with Apache Airflow v1.10.3使用 Apache Airflow v1.10.3

Thank you very much in advance.非常感谢您提前。

Updating conf after a dag run has been created isn't as straight forward as reading from conf, because conf is read from the dag_run metadata table whenever it's used after a dag run has been created.在创建 dag 运行后更新 conf 并不像从 conf 中读取那样直接,因为在创建 dag 运行后,只要使用 dag_run 元数据表,就会从 dag_run 元数据表中读取 conf。 While Variables have methods to both write to and read from a metadata table, dag runs only let you read.虽然变量具有写入和读取元数据表的方法,但 dag 运行只允许您读取。

I agree that Variables are a useful tool, but when you have k=v pairs that you only want to use for a single run, it gets complicated and messy.我同意变量是一个有用的工具,但是当您有 k=v 对只想用于单次运行时,它会变得复杂和混乱。

Below is an operator that will let you update a dag_run's conf after instantiation (tested in v1.10.10):下面是一个运算符,可让您在实例化后更新 dag_run 的配置(在 v1.10.10 中测试):

#! /usr/bin/env python3
"""Operator to overwrite a dag run's conf after creation."""


import os

from airflow.models import BaseOperator
from airflow.utils.db import provide_session
from airflow.utils.decorators import apply_defaults
from airflow.utils.operator_helpers import context_to_airflow_vars


class UpdateConfOperator(BaseOperator):
    """Updates an existing DagRun's conf with `given_conf`.

    Args:
        given_conf: A dictionary of k:v values to update a DagRun's conf with. Templated.
        replace: Whether or not `given_conf` should replace conf (True)
                 or be used to update the existing conf (False).
                 Defaults to True.

    """

    template_fields = ("given_conf",)
    ui_color = "#ffefeb"

    @apply_defaults
    def __init__(self, given_conf: Dict, replace: bool = True, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.given_conf = given_conf
        self.replace = replace

    @staticmethod
    def update_conf(given_conf: Dict, replace: bool = True, **context) -> None:
        @provide_session
        def save_to_db(dag_run, session):
            session.add(dag_run)
            session.commit()
            dag_run.refresh_from_db()

        dag_run = context["dag_run"]
        # When there's no conf provided,
        # conf will be None if scheduled or {} if manually triggered
        if replace or not dag_run.conf:
            dag_run.conf = given_conf
        elif dag_run.conf:
            # Note: dag_run.conf.update(given_conf) doesn't work
            dag_run.conf = {**dag_run.conf, **given_conf}

        save_to_db(dag_run)

    def execute(self, context):
        # Export context to make it available for callables to use.
        airflow_context_vars = context_to_airflow_vars(context, in_env_var_format=True)
        self.log.debug(
            "Exporting the following env vars:\n%s",
            "\n".join(["{}={}".format(k, v) for k, v in airflow_context_vars.items()]),
        )
        os.environ.update(airflow_context_vars)

        self.update_conf(given_conf=self.given_conf, replace=self.replace, **context)

Example usage:用法示例:

CONF = {"field1": 3, "field2": 4}
with DAG(
    "some_dag",
    # schedule_interval="*/1 * * * *",
    schedule_interval=None,
    max_active_runs=1,
    catchup=False,
) as dag:
    t_update_conf = UpdateConfOperator(
        task_id="update_conf", given_conf=CONF,
    )
    t_print_conf = BashOperator(
        task_id="print_conf",
        bash_command="echo {{ dag_run['conf'] }}",
    )
    t_update_conf >> t_print_conf

This seems like a good use-case of Airflow Variable s .这似乎是 Airflow Variable一个很好的用例。 If you were to read your configs from Variables you can easily see and modify the configuration inputs from the Airflow UI itself.如果您要从变量中读取配置,您可以轻松地从 Airflow UI 本身查看和修改配置输入。


You can even go creative and automate that updation of config (which is now stored in a Variable) before re-running a Task / DAG via another Airflow task itself.在通过另一个 Airflow 任务本身重新运行任务/DAG 之前,您甚至可以发挥创意并自动更新配置(现在存储在变量中)。 See With code, how do you update and airflow variable请参阅使用代码,您如何更新和气流变量

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM