简体   繁体   English

如何对来自用户界面的气流中的DAG进行参数化?

[英]How to paramaterize DAGs in airflow from UI?

Context: I've defined a airflow DAG which performs an operation, compute_metrics , on some data for an entity based on a parameter called org . 上下文:我已经定义了一个气流DAG,它基于名为org的参数对实体的某些数据执行操作compute_metrics Underneath something like myapi.compute_metrics(org) is called. 在像myapi.compute_metrics(org)类的东西下面被调用。 This flow will mostly be run on an ad-hoc basis. 此流程将主要在临时基础上运行。

Problem: I'd like to be able to select the org to run the flow against when I manually trigger the DAG from the airflow UI. 问题:当我从气流UI手动触发DAG时,我希望能够选择要对其运行流程的org

The most straightforward solution I can think of is to generate n different DAGs, one for each org. 我能想到的最直接的解决方案是生成n不同的DAG,每个组织一个。 The DAGs would have id s like: compute_metrics_1 , compute_metrics_2 , etc... and then when I need to trigger compute metrics for a single org , I can pick the DAG for that org. DAG的id将为: compute_metrics_1compute_metrics_2等,然后当我需要触发单个org计算指标时,我可以为该组织选择DAG。 This doesn't scale as I add orgs and as I add more types of computation. 当我添加组织以及添加更多类型的计算时,这不会扩展。

I've done some research and it seems that I can create a flask blueprint for airflow, which to my understanding, extends the UI. 我已经进行了一些研究,似乎可以为气流创建一个烧瓶蓝图,据我了解,该蓝图可扩展UI。 In this extended UI I can add input components, like a text box, for picking an org and then pass that as a conf to a DagRun which is manually created by the blueprint. 在此扩展的UI中,我可以添加输入组件(例如文本框)以选择组织,然后将其作为conf传递给由蓝图手动创建的DagRun Is that correct? 那是对的吗? I'm imaging I could write something like: 我正在想像我可以写一些东西:

session = settings.Session()

execution_date = datetime.now()
run_id = 'external_trigger_' + execution_date.isoformat()

trigger = DagRun(
    dag_id='general_compute_metrics_needs_org_id',
    run_id=run_id,
    state=State.RUNNING,
    execution_date=execution_date,
    external_trigger=True,
    conf=org_ui_component.text) # pass the org id from a component in the blueprint
session.add(trigger)
session.commit() # I don't know if this would actually be scheduled by the scheduler

Is my idea sound? 我的主意听起来不错吗? Is there a better way to achieve what I want? 有没有更好的方法来实现我想要的?

I've done some research and it seems that I can create a flask blueprint for airflow, which to my understanding, extends the UI. 我已经进行了一些研究,似乎可以为气流创建一个烧瓶蓝图,据我了解,该蓝图可扩展UI。

The blueprint extends the API. 该蓝图扩展了API。 If you want some UI for it, you'll need to serve a template view. 如果您想要一些用户界面,则需要提供模板视图。 The most feature-complete way of achieve this is developing your own Airflow Plugin . 实现此功能的最完整的方法是开发自己的Airflow Plugin

If you want to manually create DagRun s, you can use this trigger as reference. 如果要手动创建DagRun ,则可以将此触发器用作参考。 For simplicity, I'd trigger a Dag with the API . 为简单起见,我将使用API触发Dag。

And specifically about your problem, I would have a single DAG compute_metrics that reads the org from an Airflow Variable . 特别是关于您的问题,我将有一个DAG compute_metrics从Airflow Variable读取org They are global and can be set dynamically. 它们是全局的,可以动态设置。 You can prefix the variable name with something like the DagRun id to make it unique and thus dag-concurrent safe. 您可以在变量名前添加DagRun id之类的前缀,以使其唯一,从而确保dag-concurrent安全。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM