I am new in Airflow , I want to create a dag, which runs every hour, we have a process importdata
which imports files, followed a process sendReport
which sends us a report twice a day at 8am and 6pm. How can I include the times ?
EDIT: I missed the part of your question where you said twice a day. Note that you can't ask for something to happen exactly on 8AM or 6PM because the execution depends on system resources. The DAG may be actually running on 8:20
, 08:32
etc... However since we are scheduling hourly job we know that there should be exactly one run between 8AM
to 9AM
so we will just verify if the timeframe is met and if so we will execute sendReport
. Updated code:
import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import ShortCircuitOperator
from airflow.utils.edgemodifier import Label
def time_in_range(start, end, x):
"""Return true if x is in the range [start, end]"""
if start <= end:
return start <= x <= end
else:
return start <= x or x <= end
def shortcircuit_fn():
return time_in_range(
datetime.time(8, 0, 0),
datetime.time(9, 0, 0),
datetime.datetime.now().time(),
) or time_in_range(
datetime.time(18, 0, 0),
datetime.time(19, 0, 0),
datetime.datetime.now().time(),
)
with DAG(
dag_id="with_short_circuit_twice_a_day",
schedule_interval='@hourly',
start_date=datetime.datetime(2021, 7, 17),
catchup=False
) as dag:
first_op = DummyOperator(task_id='importdata') # Replace with your operator
short_op = ShortCircuitOperator(task_id='short_circuit',
python_callable=shortcircuit_fn
)
send_op = DummyOperator(task_id='sendReport') # Replace with your operator
first_op >> short_op >> Label("8<time<9 or 18<time<19") >> send_op
Executing outside of time window:
Previous Answer: This is relevant only if you want to send the report for all runs between 8AM
and 6PM
There are two way to get this functionality. You can use BranchDateTimeOperator
to check if the job runs within the desired timeframe as:
import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.datetime import BranchDateTimeOperator
from airflow.utils.edgemodifier import Label
with DAG(
dag_id="with_branching",
schedule_interval='@hourly',
start_date=datetime.datetime(2021, 7, 17),
catchup=False
) as dag:
first_op = DummyOperator(task_id='importdata') # Replace with your operator
branch_op = BranchDateTimeOperator(
task_id='branch',
follow_task_ids_if_true='sendReport',
follow_task_ids_if_false='do_nothing',
target_upper=datetime.time(18, 0, 0),
target_lower=datetime.time(8, 0, 0),
)
in_range_op = DummyOperator(task_id='sendReport') # Replace with your operator
out_of_range_op = DummyOperator(task_id='do_nothing')
first_op >> branch_op >> Label("8<time<18") >> in_range_op
branch_op >> Label("rest of day") >> out_of_range_op
Executing outside of time window:
Executing in time window:
This is a good solution if you need to actually do diffrent tasks in each branch of the workflow. If this is not the case then you probably should use the the 2nd option of ShortCircuitOperator
with that solution the workflow will continue to sendReport
only if the time criteria is met if the criteria isn't met it will skip:
import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import ShortCircuitOperator
from airflow.utils.edgemodifier import Label
def time_in_range(start, end, x):
"""Return true if x is in the range [start, end]"""
if start <= end:
return start <= x <= end
else:
return start <= x or x <= end
def shortcircuit_fn():
return time_in_range(datetime.time(8, 0, 0),
datetime.time(18, 0, 0),
datetime.datetime.now().time(),
)
with DAG(
dag_id="with_short_circuit",
schedule_interval='@hourly',
start_date=datetime.datetime(2021, 7, 17),
catchup=False
) as dag:
first_op = DummyOperator(task_id='importdata') # Replace with your operator
short_op = ShortCircuitOperator(task_id='short_circuit',
python_callable=shortcircuit_fn
)
send_op = DummyOperator(task_id='sendReport') # Replace with your operator
first_op >> short_op >> Label("8<time<18") >> send_op
Executing outside of time window:
Executing in time window:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.