简体   繁体   English

如何在气流中设置多个Dag目录

[英]How to set up multiple Dag directories in airflow

I have different airflow dags set up for different python projects ie one parent dags folder /vol/dags with subfolders for DAGs based on different python projects: /vol/dags/project1/project1.py, /vol/dags/project2/project2.py where DAGS_FOLDER = /vol/dags . 我为不同的python项目设置了不同的气流/vol/dags/project1/project1.py, /vol/dags/project2/project2.py即一个父dags文件夹/vol/dags带有基于不同python项目的DAG子文件夹: /vol/dags/project1/project1.py, /vol/dags/project2/project2.py其中DAGS_FOLDER = /vol/dags

project1.py for example imports a function from another python file in the same directory ie /vol/dags/project1/mycalculator.py . 例如project1.py从同一目录(即/vol/dags/project1/mycalculator.py中的另一个python文件导入函数。 But when I started airflow webserver, I get an ImportError : 但是,当我启动气流Web服务器时,出现了ImportError

/vol/dags/project1/$ airflow webserver -p 8080

INFO - Filling up the DagBag from /vol/dags/
ERROR - Failed to import: /vol/dags/project1/project1.py
Traceback (most recent call last):
  File "/Users/xxx/anaconda/lib/python2.7/site-packages/airflow/models.py", line 247, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/vol/dags/project1/project1.py", line 10, in <module>
    from mycalculator import *
ImportError: No module named mycalculator

I tried to import mycalculator.py to project1.py like this: 我试图将mycalculator.py导入到project1.py如下所示:

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators import PythonOperator
from datetime import datetime, timedelta
from mycalculator import *

dag = DAG(
    dag_id='project1', default_args=args,
    schedule_interval="@once")

The folder /vol/dags/project1/ is missing an __init__.py file. 文件夹/vol/dags/project1/缺少__init__.py文件。

This file can be empty. 该文件可以为空。

Add this file and then in project2.py you should be able to do: 添加此文件,然后在project2.py中,您应该可以执行以下操作:

import project1.mycalculator.*

See here for more info on packages: https://docs.python.org/2/tutorial/modules.html#packages 请参阅此处以获取有关软件包的更多信息: https : //docs.python.org/2/tutorial/modules.html#packages

You can use packaged dag concept to have different dag folders for different projects. 您可以使用打包的dag概念为不同的项目提供不同的dag文件夹。 You will only need to place zip of each project in your parent dag folder. 您只需要将每个项目的zip放在您的父dag文件夹中。

This way you can combine dags with its dependencies easily and your dag folder will be neat and clean as it will only contain zip of each project. 这样,您可以轻松地将dag及其依赖项组合在一起,并且dag文件夹将整洁干净,因为它仅包含每个项目的zip。

You can create a zip that looks like this: 您可以创建一个如下所示的zip:

my_dag1.py
my_dag2.py
package1/__init__.py
package1/functions.py

And your parent dag folder can look something like this: 您的父dag文件夹可能如下所示:

project1.zip
project2.zip
my_dag3.py

Same problem here. 这里同样的问题。

Indeed, our imports work because in the Airflow context, the DAG_FOLDER has been added to the PYTHONPATH. 确实,我们的导入工作是因为在“气流”环境中,DAG_FOLDER已添加到PYTHONPATH中。 To add init .py in project1/ doesn't change anything. 在project1 /中添加init .py不会更改任何内容。

A good solution could be use relative imports, as 一个好的解决方案是使用相对进口,因为

from .mycalculator import *

But relative imports cannot work right now because of how Airflow imports Dags (explained to me by airflow developer) 但是,由于Airflow如何导入Dags(气流开发人员向我解释),因此相对导入现在无法正常工作

So for me, the simpliest solution was to keep the dags files at the root, by prefixing them by 'project1_' or 'project2_', and put the libs like mycalculator in subfolders. 因此,对我而言,最简单的解决方案是将dag文件保留在根目录中,方法是在它们的前缀之前加上“ project1_”或“ project2_”,然后将像mycalculator这样的库放在子文件夹中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM