[英]How to paramaterize DAGs in airflow from UI?
Context: I've defined a airflow DAG which performs an operation, compute_metrics
, on some data for an entity based on a parameter called org
. 上下文:我已经定义了一个气流DAG,它基于名为
org
的参数对实体的某些数据执行操作compute_metrics
。 Underneath something like myapi.compute_metrics(org)
is called. 在像
myapi.compute_metrics(org)
类的东西下面被调用。 This flow will mostly be run on an ad-hoc basis. 此流程将主要在临时基础上运行。
Problem: I'd like to be able to select the org
to run the flow against when I manually trigger the DAG from the airflow UI. 问题:当我从气流UI手动触发DAG时,我希望能够选择要对其运行流程的
org
。
The most straightforward solution I can think of is to generate n
different DAGs, one for each org. 我能想到的最直接的解决方案是生成
n
不同的DAG,每个组织一个。 The DAGs would have id
s like: compute_metrics_1
, compute_metrics_2
, etc... and then when I need to trigger compute metrics for a single org
, I can pick the DAG for that org. DAG的
id
将为: compute_metrics_1
, compute_metrics_2
等,然后当我需要触发单个org
计算指标时,我可以为该组织选择DAG。 This doesn't scale as I add orgs and as I add more types of computation. 当我添加组织以及添加更多类型的计算时,这不会扩展。
I've done some research and it seems that I can create a flask blueprint for airflow, which to my understanding, extends the UI. 我已经进行了一些研究,似乎可以为气流创建一个烧瓶蓝图,据我了解,该蓝图可扩展UI。 In this extended UI I can add input components, like a text box, for picking an org and then pass that as a
conf
to a DagRun
which is manually created by the blueprint. 在此扩展的UI中,我可以添加输入组件(例如文本框)以选择组织,然后将其作为
conf
传递给由蓝图手动创建的DagRun
。 Is that correct? 那是对的吗? I'm imaging I could write something like:
我正在想像我可以写一些东西:
session = settings.Session() execution_date = datetime.now() run_id = 'external_trigger_' + execution_date.isoformat() trigger = DagRun( dag_id='general_compute_metrics_needs_org_id', run_id=run_id, state=State.RUNNING, execution_date=execution_date, external_trigger=True, conf=org_ui_component.text) # pass the org id from a component in the blueprint session.add(trigger) session.commit() # I don't know if this would actually be scheduled by the scheduler
Is my idea sound? 我的主意听起来不错吗? Is there a better way to achieve what I want?
有没有更好的方法来实现我想要的?
I've done some research and it seems that I can create a flask blueprint for airflow, which to my understanding, extends the UI.
我已经进行了一些研究,似乎可以为气流创建一个烧瓶蓝图,据我了解,该蓝图可扩展UI。
The blueprint extends the API. 该蓝图扩展了API。 If you want some UI for it, you'll need to serve a template view.
如果您想要一些用户界面,则需要提供模板视图。 The most feature-complete way of achieve this is developing your own Airflow Plugin .
实现此功能的最完整的方法是开发自己的Airflow Plugin 。
If you want to manually create DagRun
s, you can use this trigger as reference. 如果要手动创建
DagRun
,则可以将此触发器用作参考。 For simplicity, I'd trigger a Dag with the API . 为简单起见,我将使用API触发Dag。
And specifically about your problem, I would have a single DAG compute_metrics
that reads the org
from an Airflow Variable . 特别是关于您的问题,我将有一个DAG
compute_metrics
从Airflow Variable读取org
。 They are global and can be set dynamically. 它们是全局的,可以动态设置。 You can prefix the variable name with something like the DagRun id to make it unique and thus dag-concurrent safe.
您可以在变量名前添加DagRun id之类的前缀,以使其唯一,从而确保dag-concurrent安全。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.