简体繁体 English

如何对来自用户界面的气流中的DAG进行参数化？

[英]How to paramaterize DAGs in airflow from UI?

原文 2018-04-04 21:35:07 1 1 airflow/ airflow-scheduler

Context: I've defined a airflow DAG which performs an operation, compute_metrics , on some data for an entity based on a parameter called org . 上下文：我已经定义了一个气流DAG，它基于名为org的参数对实体的某些数据执行操作compute_metrics 。 Underneath something like myapi.compute_metrics(org) is called. 在像myapi.compute_metrics(org)类的东西下面被调用。 This flow will mostly be run on an ad-hoc basis. 此流程将主要在临时基础上运行。

Problem: I'd like to be able to select the org to run the flow against when I manually trigger the DAG from the airflow UI. 问题：当我从气流UI手动触发DAG时，我希望能够选择要对其运行流程的org 。

The most straightforward solution I can think of is to generate n different DAGs, one for each org. 我能想到的最直接的解决方案是生成n不同的DAG，每个组织一个。 The DAGs would have id s like: compute_metrics_1 , compute_metrics_2 , etc... and then when I need to trigger compute metrics for a single org , I can pick the DAG for that org. DAG的id将为： compute_metrics_1 ， compute_metrics_2等，然后当我需要触发单个org计算指标时，我可以为该组织选择DAG。 This doesn't scale as I add orgs and as I add more types of computation. 当我添加组织以及添加更多类型的计算时，这不会扩展。

I've done some research and it seems that I can create a flask blueprint for airflow, which to my understanding, extends the UI. 我已经进行了一些研究，似乎可以为气流创建一个烧瓶蓝图，据我了解，该蓝图可扩展UI。 In this extended UI I can add input components, like a text box, for picking an org and then pass that as a conf to a DagRun which is manually created by the blueprint. 在此扩展的UI中，我可以添加输入组件（例如文本框）以选择组织，然后将其作为conf传递给由蓝图手动创建的DagRun 。 Is that correct? 那是对的吗？ I'm imaging I could write something like: 我正在想像我可以写一些东西：

session = settings.Session()

execution_date = datetime.now()
run_id = 'external_trigger_' + execution_date.isoformat()

trigger = DagRun(
    dag_id='general_compute_metrics_needs_org_id',
    run_id=run_id,
    state=State.RUNNING,
    execution_date=execution_date,
    external_trigger=True,
    conf=org_ui_component.text) # pass the org id from a component in the blueprint
session.add(trigger)
session.commit() # I don't know if this would actually be scheduled by the scheduler

Is my idea sound? 我的主意听起来不错吗？ Is there a better way to achieve what I want? 有没有更好的方法来实现我想要的？

1 个解决方案

I've done some research and it seems that I can create a flask blueprint for airflow, which to my understanding, extends the UI. 我已经进行了一些研究，似乎可以为气流创建一个烧瓶蓝图，据我了解，该蓝图可扩展UI。

The blueprint extends the API. 该蓝图扩展了API。 If you want some UI for it, you'll need to serve a template view. 如果您想要一些用户界面，则需要提供模板视图。 The most feature-complete way of achieve this is developing your own Airflow Plugin . 实现此功能的最完整的方法是开发自己的Airflow Plugin 。

If you want to manually create DagRun s, you can use this trigger as reference. 如果要手动创建DagRun ，则可以将此触发器用作参考。 For simplicity, I'd trigger a Dag with the API . 为简单起见，我将使用API触发Dag。

And specifically about your problem, I would have a single DAG compute_metrics that reads the org from an Airflow Variable . 特别是关于您的问题，我将有一个DAG compute_metrics从Airflow Variable读取org 。 They are global and can be set dynamically. 它们是全局的，可以动态设置。 You can prefix the variable name with something like the DagRun id to make it unique and thus dag-concurrent safe. 您可以在变量名前添加DagRun id之类的前缀，以使其唯一，从而确保dag-concurrent安全。