[英]How to manually run Airflow DAG on a particular directory
I am evaluating whether Airflow is suitable for my needs (in bioinformatics). 我正在评估气流是否适合我的需求(在生物信息学中)。 I am having some difficulty with the Airflow model.
我在使用气流模型时遇到了一些困难。 Specifically:
特别:
Here is an example of what I would like to execute. 这是我要执行的示例。 Say I just received some data as a directory containing 20 files available in some shared filesystem.
假设我刚刚在目录中收到一些数据,其中包含一些共享文件系统中可用的20个文件。 I want to execute a DAG pipeline which runs a particular bash command on each of the 20 files, then combines some of the results and performs further processing.
我想执行DAG管道,该管道在20个文件中的每个文件上运行特定的bash命令,然后合并一些结果并执行进一步的处理。 The DAG needs the path on the filesystem and also to list the files in the directory to construct a task for each one.
DAG需要文件系统上的路径,还需要列出目录中的文件以为每个文件构造一个任务。
It's probably not necessary for me to pass metadata from one task to another (which I understand is possible through XCom
), as long as I can dynamically construct the entire DAG upfront. 只要我可以动态地预先构建整个DAG,对我来说就没有必要将元数据从一个任务传递到另一任务(我知道可以通过
XCom
)。 But it's not clear to me how I can pass a path to the DAG construction. 但是我不清楚如何将DAG构造传递给我。
Put another way, I'd like my DAG definition to include something like 换句话说,我希望DAG定义包含类似
dag = DAG(...)
for file in glob(input_path):
t = BashOperator(..., dag=dag)
How do I get input_path
passed in when I want to manually trigger a DAG? 要手动触发DAG时如何传递
input_path
?
I also don't really have need for the cron-style scheduling. 我也确实不需要cron式的调度。
Regarding input_path
you can pass it to the DAG using Airflow variables. 关于
input_path
您可以使用Airflow变量将其传递给DAG。 Example of code used in the DAG file: DAG文件中使用的代码示例:
input_path = Variable.get("INPUT_PATH")
Variables can be imported using Airflow cli or manually through the UI. 可以使用Airflow cli导入变量,也可以通过UI手动导入变量。
You should use a subdag for this type of logic: 对于这种类型的逻辑,应该使用subdag:
dag = DAG(...) for file in glob(input_path): t = BashOperator(..., dag=dag)
SubDAGs are perfect for repeating patterns.
SubDAG非常适合重复模式。 Defining a function that returns a DAG object is a nice design pattern when using Airflow.
使用Airflow时,定义返回DAG对象的函数是一种不错的设计模式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.