繁体   English   中英

Airflow - 按执行日期无时间查找特定 dag id 的 dag 运行

[英]Airflow - find dag run of specific dag id by execution date without time

我想查找特定执行日期的特定 dag 的所有 dag 运行。

正如我在文档中所读到的,有这个 function:

dag_runs = DagRun.find(dag_id=self.dag_name, execution_date=datetime.now())

这样做的问题是时间也需要完全相同。 有没有什么方法可以让我只传递日期并且无论白天什么时间都可以检索所有运行?

我知道我可以在之后从 dag_runs 过滤所有 dags,在所需的一天,但我想要一些更有效的东西,它不会从数据库中带来所有记录。

在 gcp 作曲家中使用 airflow 1.10.10。 所以在 class DagRun 中添加一个方法对我来说不是一个选项。

对于Airflow >= 2.0.0 ,您可以使用:

dag_runs = DagRun.find(
    dag_id=your_dag_id,
    execution_start_date=your_start_date
    execution_end_date=your_end_date
)

对于Airflow < 2.0.0可以创建继承自MyDagRunDagRun并向后移植所需的功能。

这是一个工作测试的代码:

from datetime import datetime
from typing import List, Optional, Union

from airflow import DAG
from airflow.models.dagrun import DagRun
from airflow.operators.python_operator import PythonOperator
from airflow.utils import timezone
from airflow.utils.db import provide_session
from sqlalchemy.orm.session import Session


class MyDagRun(DagRun):

    @staticmethod
    @provide_session
    def find(
        dag_id: Optional[Union[str, List[str]]] = None,
        run_id: Optional[str] = None,
        execution_date: Optional[datetime] = None,
        state: Optional[str] = None,
        external_trigger: Optional[bool] = None,
        no_backfills: bool = False,
        session: Session = None,
        execution_start_date: Optional[datetime] = None,
        execution_end_date: Optional[datetime] = None,
    ) -> List["DagRun"]:

        DR = MyDagRun

        qry = session.query(DR)
        dag_ids = [dag_id] if isinstance(dag_id, str) else dag_id
        if dag_ids:
            qry = qry.filter(DR.dag_id.in_(dag_ids))
        if run_id:
            qry = qry.filter(DR.run_id == run_id)
        if execution_date:
            if isinstance(execution_date, list):
                qry = qry.filter(DR.execution_date.in_(execution_date))
            else:
                qry = qry.filter(DR.execution_date == execution_date)
        if execution_start_date and execution_end_date:
            qry = qry.filter(DR.execution_date.between(execution_start_date, execution_end_date))
        elif execution_start_date:
            qry = qry.filter(DR.execution_date >= execution_start_date)
        elif execution_end_date:
            qry = qry.filter(DR.execution_date <= execution_end_date)
        if state:
            qry = qry.filter(DR.state == state)
        if external_trigger is not None:
            qry = qry.filter(DR.external_trigger == external_trigger)
        if no_backfills:
            # in order to prevent a circular dependency
            from airflow.jobs import BackfillJob
            qry = qry.filter(DR.run_id.notlike(BackfillJob.ID_PREFIX + '%'))

        return qry.order_by(DR.execution_date).all()


def func(**kwargs):
    dr = MyDagRun()
    # Need to use timezone to avoid ValueError: naive datetime is disallowed
    start = timezone.make_aware(datetime(2021, 3, 1, 9, 59, 0)) # change to your required start 
    end = timezone.make_aware(datetime(2021, 3, 1, 10, 1, 0)) # change to your required end 
    results = dr.find(execution_start_date=start,
                      execution_end_date=end
                      )

    print("Execution dates met criteria are:")
    for item in results:
        print(item.execution_date)
    return results


default_args = {
    'owner': 'airflow',
    'start_date': datetime(2019, 11, 1),

}

with DAG(dag_id='test',
         default_args=default_args,
         schedule_interval=None,
         catchup=True
         ) as dag:

    op = PythonOperator(task_id="task",
                        python_callable=func)

显示 4 个现有运行的示例:

在此处输入图像描述

使用它选择所需运行的代码: 在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM