[英]MWAA/Airflow: Importing task code into dag files from submodules
I'm using MWAA on AWS for a workflow orchestration project.我在 AWS 上使用 MWAA 进行工作流编排项目。 I have a DAG folder structure like this (and gets uploaded into S3 like this, with the MWAA Dags folder set to src/dags
):我有一个像这样的 DAG 文件夹结构(并像这样上传到 S3,MWAA Dags 文件夹设置为src/dags
dags ):
src/
├─ dags/
│ ├─ first_dag/
│ │ ├─ tasks/
│ │ │ ├─ task_01/
│ │ │ │ ├─ task.py
│ │ │ │ ├─ __init__.py
│ │ │ ├─ task_02/
│ │ │ │ ├─ task.py
│ │ │ │ ├─ __init__.py
│ │ │ ├─ __init__.py
│ │ ├─ dag.py
│ │ ├─ __init__.py
│ ├─ __init__.py
src/dags/first_dag/dag.py
looks something like this: src/dags/first_dag/dag.py
看起来像这样:
from airflow.models import DAG
from first_dag.tasks.task_01.task import create_task as create_task_01
from first_dag.tasks.task_02.task import create_task as create_task_02
default_args = {
'owner': 'Me'
}
with DAG(dag_id='test', schedule_interval=None, default_args=default_args) as dag:
task_01 = create_task_01()
task_02 = create_task_02()
task_01 >> task_02
and src/dags/first_dag/tasks/task_01/task.py
and src/dags/first_dag/tasks/task_02/task.py
are basically the same file like this:和src/dags/first_dag/tasks/task_01/task.py
和src/dags/first_dag/tasks/task_02/task.py
基本上是同一个文件,如下所示:
from airflow.operators.dummy import DummyOperator
def create_task():
task = DummyOperator(
task_id='dummy_task'
)
return task
My understanding from the official Airflow docs is that the DAGs folder is automatically added to the PYTHONPATH
env variable.我从官方 Airflow 文档的理解是 DAGs 文件夹会自动添加到PYTHONPATH
环境变量中。 Yet, the MWAA UI shows a DAG import error:然而,MWAA UI 显示 DAG 导入错误:
Broken DAG: [/usr/local/airflow/dags/first_dag/dag.py] Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/airflow/dags/first_dag/dag.py", line 2, in <module>
from first_dag.tasks.task_01.task import create_task as create_task_01
ModuleNotFoundError: No module named 'first_dag.tasks'
I'm not really understanding why Python can't find the module - does MWAA do something different with PYTHONPATH
?我不太明白为什么 Python 找不到模块 - MWAA 与PYTHONPATH
有什么不同吗? Am I importing incorrectly?我是否导入不正确? How can I resolve?我该如何解决?
Edit: I also tried adding the first_dag
directory to a plugins.zip
archive (removing the dag.py
file) to see if it would recognize it as a plugin - still giving the ModuleNotFound
error.编辑:我还尝试将first_dag
目录添加到plugins.zip
存档(删除dag.py
文件)以查看它是否会将其识别为插件 - 仍然给出ModuleNotFound
错误。
Here's a similar Stack Overflow question: Set PYTHONPATH in MWAA这是一个类似的堆栈溢出问题: Set PYTHONPATH in MWAA
does MWAA do something different with PYTHONPATH? MWAA 对 PYTHONPATH 有什么不同吗?
No, it doesn't.不,它没有。 However, it's never explicitly set by the user.但是,它永远不会由用户明确设置。
Am I importing incorrectly?我是否导入不正确? How can I resolve?我该如何解决?
dag.py
is in the first_dag
sub-package, so it can be excluded from the import
statement. dag.py
在first_dag
子包中,因此可以从import
语句中排除。
from tasks.task_01.task import create_task as create_task_01
from tasks.task_02.task import create_task as create_task_02
This is explained more in this Stack Overflow question , which is also directly referenced in the Airflow documentation .这在Stack Overflow 问题中有更多解释, Airflow 文档中也直接引用了该问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.