简体   繁体   English

MWAA/Airflow:将任务代码从子模块导入 dag 文件

[英]MWAA/Airflow: Importing task code into dag files from submodules

I'm using MWAA on AWS for a workflow orchestration project.我在 AWS 上使用 MWAA 进行工作流编排项目。 I have a DAG folder structure like this (and gets uploaded into S3 like this, with the MWAA Dags folder set to src/dags ):我有一个像这样的 DAG 文件夹结构(并像这样上传到 S3,MWAA Dags 文件夹设置为src/dags dags ):


src/
├─ dags/
│  ├─ first_dag/
│  │  ├─ tasks/
│  │  │  ├─ task_01/
│  │  │  │  ├─ task.py
│  │  │  │  ├─ __init__.py
│  │  │  ├─ task_02/
│  │  │  │  ├─ task.py
│  │  │  │  ├─ __init__.py
│  │  │  ├─ __init__.py
│  │  ├─ dag.py
│  │  ├─ __init__.py
│  ├─ __init__.py

src/dags/first_dag/dag.py looks something like this: src/dags/first_dag/dag.py看起来像这样:

from airflow.models import DAG
from first_dag.tasks.task_01.task import create_task as create_task_01
from first_dag.tasks.task_02.task import create_task as create_task_02

default_args = {
    'owner': 'Me'
}

with DAG(dag_id='test', schedule_interval=None, default_args=default_args) as dag:
    task_01 = create_task_01()
    task_02 = create_task_02()

    task_01 >> task_02

and src/dags/first_dag/tasks/task_01/task.py and src/dags/first_dag/tasks/task_02/task.py are basically the same file like this:src/dags/first_dag/tasks/task_01/task.pysrc/dags/first_dag/tasks/task_02/task.py基本上是同一个文件,如下所示:

from airflow.operators.dummy import DummyOperator


def create_task():
    task = DummyOperator(
        task_id='dummy_task'
    )
    return task

My understanding from the official Airflow docs is that the DAGs folder is automatically added to the PYTHONPATH env variable.我从官方 Airflow 文档的理解是 DAGs 文件夹会自动添加到PYTHONPATH环境变量中。 Yet, the MWAA UI shows a DAG import error:然而,MWAA UI 显示 DAG 导入错误:

Broken DAG: [/usr/local/airflow/dags/first_dag/dag.py] Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/airflow/dags/first_dag/dag.py", line 2, in <module>
    from first_dag.tasks.task_01.task import create_task as create_task_01
ModuleNotFoundError: No module named 'first_dag.tasks'

I'm not really understanding why Python can't find the module - does MWAA do something different with PYTHONPATH ?我不太明白为什么 Python 找不到模块 - MWAA 与PYTHONPATH有什么不同吗? Am I importing incorrectly?我是否导入不正确? How can I resolve?我该如何解决?

Edit: I also tried adding the first_dag directory to a plugins.zip archive (removing the dag.py file) to see if it would recognize it as a plugin - still giving the ModuleNotFound error.编辑:我还尝试将first_dag目录添加到plugins.zip存档(删除dag.py文件)以查看它是否会将其识别为插件 - 仍然给出ModuleNotFound错误。

Here's a similar Stack Overflow question: Set PYTHONPATH in MWAA这是一个类似的堆栈溢出问题: Set PYTHONPATH in MWAA

does MWAA do something different with PYTHONPATH? MWAA 对 PYTHONPATH 有什么不同吗?

No, it doesn't.不,它没有。 However, it's never explicitly set by the user.但是,它永远不会由用户明确设置。

Am I importing incorrectly?我是否导入不正确? How can I resolve?我该如何解决?

dag.py is in the first_dag sub-package, so it can be excluded from the import statement. dag.pyfirst_dag子包中,因此可以从import语句中排除。

from tasks.task_01.task import create_task as create_task_01
from tasks.task_02.task import create_task as create_task_02

This is explained more in this Stack Overflow question , which is also directly referenced in the Airflow documentation .这在Stack Overflow 问题中有更多解释, Airflow 文档中也直接引用了该问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM