简体   繁体   English

在气流 dag 中包含时间

[英]Include time in airflow dag

I am new in Airflow , I want to create a dag, which runs every hour, we have a process importdata which imports files, followed a process sendReport which sends us a report twice a day at 8am and 6pm.我在气流新的,我想创建一个DAG,它运行每隔一小时,我们有一个过程importdata其进口的文件,随后的过程sendReport这给我们一份报告,每天两次在上午8点和下午6点。 How can I include the times ?我怎样才能包括时间?

EDIT: I missed the part of your question where you said twice a day.编辑:我错过了你每天说两次的问题部分。 Note that you can't ask for something to happen exactly on 8AM or 6PM because the execution depends on system resources.请注意,您不能要求某事恰好在上午 8 点或下午 6 点发生,因为执行取决于系统资源。 The DAG may be actually running on 8:20 , 08:32 etc... However since we are scheduling hourly job we know that there should be exactly one run between 8AM to 9AM so we will just verify if the timeframe is met and if so we will execute sendReport . DAG 可能实际上在8:20 : 08:32等运行......但是,由于我们正在安排每小时工作,我们知道应该在8AM9AM之间运行一次,因此我们将验证是否满足时间范围以及是否满足所以我们将执行sendReport Updated code:更新代码:

import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import ShortCircuitOperator
from airflow.utils.edgemodifier import Label


def time_in_range(start, end, x):
    """Return true if x is in the range [start, end]"""
    if start <= end:
        return start <= x <= end
    else:
        return start <= x or x <= end


def shortcircuit_fn():
    return time_in_range(
        datetime.time(8, 0, 0),
        datetime.time(9, 0, 0),
        datetime.datetime.now().time(),
    ) or time_in_range(
        datetime.time(18, 0, 0),
        datetime.time(19, 0, 0),
        datetime.datetime.now().time(),
    )


with DAG(
    dag_id="with_short_circuit_twice_a_day",
    schedule_interval='@hourly',
    start_date=datetime.datetime(2021, 7, 17),
    catchup=False
) as dag:

    first_op = DummyOperator(task_id='importdata') # Replace with your operator
    short_op = ShortCircuitOperator(task_id='short_circuit',
                                    python_callable=shortcircuit_fn
                                    )
    send_op = DummyOperator(task_id='sendReport') # Replace with your operator

    first_op >> short_op >> Label("8<time<9 or 18<time<19") >> send_op

Executing outside of time window:在时间窗口外执行: 在此处输入图片说明

Executing in time window:在时间窗口中执行: 在此处输入图片说明

Previous Answer: This is relevant only if you want to send the report for all runs between 8AM and 6PM上一个答案:仅当您要发送所有8AM6PM 8AM之间运行的报告时,这才是相关的

There are two way to get this functionality.有两种方法可以获得此功能。 You can use BranchDateTimeOperator to check if the job runs within the desired timeframe as:您可以使用BranchDateTimeOperator检查作业是否在所需的时间范围内运行,如下所示:

import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.datetime import BranchDateTimeOperator
from airflow.utils.edgemodifier import Label

with DAG(
    dag_id="with_branching",
    schedule_interval='@hourly',
    start_date=datetime.datetime(2021, 7, 17),
    catchup=False
) as dag:

    first_op = DummyOperator(task_id='importdata') # Replace with your operator
    branch_op = BranchDateTimeOperator(
        task_id='branch',
        follow_task_ids_if_true='sendReport',
        follow_task_ids_if_false='do_nothing',
        target_upper=datetime.time(18, 0, 0),
        target_lower=datetime.time(8, 0, 0),
    )
    in_range_op = DummyOperator(task_id='sendReport') # Replace with your operator
    out_of_range_op = DummyOperator(task_id='do_nothing')

    first_op >> branch_op >> Label("8<time<18") >> in_range_op
    branch_op >> Label("rest of day") >> out_of_range_op

Executing outside of time window:在时间窗口外执行:

在此处输入图片说明

Executing in time window:在时间窗口中执行:

在此处输入图片说明

This is a good solution if you need to actually do diffrent tasks in each branch of the workflow.如果您需要在工作流的每个分支中实际执行不同的任务,这是一个很好的解决方案。 If this is not the case then you probably should use the the 2nd option of ShortCircuitOperator with that solution the workflow will continue to sendReport only if the time criteria is met if the criteria isn't met it will skip:如果不是这种情况,那么您可能应该在该解决方案中使用ShortCircuitOperator的第二个选项,只有在满足时间标准的情况下,工作流才会继续发送sendReport如果不满足标准,它将跳过:

import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import ShortCircuitOperator
from airflow.utils.edgemodifier import Label


def time_in_range(start, end, x):
    """Return true if x is in the range [start, end]"""
    if start <= end:
        return start <= x <= end
    else:
        return start <= x or x <= end


def shortcircuit_fn():
    return time_in_range(datetime.time(8, 0, 0),
                         datetime.time(18, 0, 0),
                         datetime.datetime.now().time(),
                         )


with DAG(
    dag_id="with_short_circuit",
    schedule_interval='@hourly',
    start_date=datetime.datetime(2021, 7, 17),
    catchup=False
) as dag:

    first_op = DummyOperator(task_id='importdata') # Replace with your operator
    short_op = ShortCircuitOperator(task_id='short_circuit',
                                    python_callable=shortcircuit_fn
                                    )
    send_op = DummyOperator(task_id='sendReport') # Replace with your operator

    first_op >> short_op >> Label("8<time<18") >> send_op

Executing outside of time window:在时间窗口外执行:

在此处输入图片说明

Executing in time window:在时间窗口中执行:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM