简体   繁体   中英

Include time in airflow dag

I am new in Airflow , I want to create a dag, which runs every hour, we have a process importdata which imports files, followed a process sendReport which sends us a report twice a day at 8am and 6pm. How can I include the times ?

EDIT: I missed the part of your question where you said twice a day. Note that you can't ask for something to happen exactly on 8AM or 6PM because the execution depends on system resources. The DAG may be actually running on 8:20 , 08:32 etc... However since we are scheduling hourly job we know that there should be exactly one run between 8AM to 9AM so we will just verify if the timeframe is met and if so we will execute sendReport . Updated code:

import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import ShortCircuitOperator
from airflow.utils.edgemodifier import Label


def time_in_range(start, end, x):
    """Return true if x is in the range [start, end]"""
    if start <= end:
        return start <= x <= end
    else:
        return start <= x or x <= end


def shortcircuit_fn():
    return time_in_range(
        datetime.time(8, 0, 0),
        datetime.time(9, 0, 0),
        datetime.datetime.now().time(),
    ) or time_in_range(
        datetime.time(18, 0, 0),
        datetime.time(19, 0, 0),
        datetime.datetime.now().time(),
    )


with DAG(
    dag_id="with_short_circuit_twice_a_day",
    schedule_interval='@hourly',
    start_date=datetime.datetime(2021, 7, 17),
    catchup=False
) as dag:

    first_op = DummyOperator(task_id='importdata') # Replace with your operator
    short_op = ShortCircuitOperator(task_id='short_circuit',
                                    python_callable=shortcircuit_fn
                                    )
    send_op = DummyOperator(task_id='sendReport') # Replace with your operator

    first_op >> short_op >> Label("8<time<9 or 18<time<19") >> send_op

Executing outside of time window: 在此处输入图片说明

Executing in time window: 在此处输入图片说明

Previous Answer: This is relevant only if you want to send the report for all runs between 8AM and 6PM

There are two way to get this functionality. You can use BranchDateTimeOperator to check if the job runs within the desired timeframe as:

import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.datetime import BranchDateTimeOperator
from airflow.utils.edgemodifier import Label

with DAG(
    dag_id="with_branching",
    schedule_interval='@hourly',
    start_date=datetime.datetime(2021, 7, 17),
    catchup=False
) as dag:

    first_op = DummyOperator(task_id='importdata') # Replace with your operator
    branch_op = BranchDateTimeOperator(
        task_id='branch',
        follow_task_ids_if_true='sendReport',
        follow_task_ids_if_false='do_nothing',
        target_upper=datetime.time(18, 0, 0),
        target_lower=datetime.time(8, 0, 0),
    )
    in_range_op = DummyOperator(task_id='sendReport') # Replace with your operator
    out_of_range_op = DummyOperator(task_id='do_nothing')

    first_op >> branch_op >> Label("8<time<18") >> in_range_op
    branch_op >> Label("rest of day") >> out_of_range_op

Executing outside of time window:

在此处输入图片说明

Executing in time window:

在此处输入图片说明

This is a good solution if you need to actually do diffrent tasks in each branch of the workflow. If this is not the case then you probably should use the the 2nd option of ShortCircuitOperator with that solution the workflow will continue to sendReport only if the time criteria is met if the criteria isn't met it will skip:

import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import ShortCircuitOperator
from airflow.utils.edgemodifier import Label


def time_in_range(start, end, x):
    """Return true if x is in the range [start, end]"""
    if start <= end:
        return start <= x <= end
    else:
        return start <= x or x <= end


def shortcircuit_fn():
    return time_in_range(datetime.time(8, 0, 0),
                         datetime.time(18, 0, 0),
                         datetime.datetime.now().time(),
                         )


with DAG(
    dag_id="with_short_circuit",
    schedule_interval='@hourly',
    start_date=datetime.datetime(2021, 7, 17),
    catchup=False
) as dag:

    first_op = DummyOperator(task_id='importdata') # Replace with your operator
    short_op = ShortCircuitOperator(task_id='short_circuit',
                                    python_callable=shortcircuit_fn
                                    )
    send_op = DummyOperator(task_id='sendReport') # Replace with your operator

    first_op >> short_op >> Label("8<time<18") >> send_op

Executing outside of time window:

在此处输入图片说明

Executing in time window:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM