[英]Include time in airflow dag
I am new in Airflow , I want to create a dag, which runs every hour, we have a process importdata
which imports files, followed a process sendReport
which sends us a report twice a day at 8am and 6pm.我在气流新的,我想创建一个DAG,它运行每隔一小时,我们有一个过程
importdata
其进口的文件,随后的过程sendReport
这给我们一份报告,每天两次在上午8点和下午6点。 How can I include the times ?我怎样才能包括时间?
EDIT: I missed the part of your question where you said twice a day.编辑:我错过了你每天说两次的问题部分。 Note that you can't ask for something to happen exactly on 8AM or 6PM because the execution depends on system resources.
请注意,您不能要求某事恰好在上午 8 点或下午 6 点发生,因为执行取决于系统资源。 The DAG may be actually running on
8:20
, 08:32
etc... However since we are scheduling hourly job we know that there should be exactly one run between 8AM
to 9AM
so we will just verify if the timeframe is met and if so we will execute sendReport
. DAG 可能实际上在
8:20
: 08:32
等运行......但是,由于我们正在安排每小时工作,我们知道应该在8AM
到9AM
之间运行一次,因此我们将验证是否满足时间范围以及是否满足所以我们将执行sendReport
。 Updated code:更新代码:
import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import ShortCircuitOperator
from airflow.utils.edgemodifier import Label
def time_in_range(start, end, x):
"""Return true if x is in the range [start, end]"""
if start <= end:
return start <= x <= end
else:
return start <= x or x <= end
def shortcircuit_fn():
return time_in_range(
datetime.time(8, 0, 0),
datetime.time(9, 0, 0),
datetime.datetime.now().time(),
) or time_in_range(
datetime.time(18, 0, 0),
datetime.time(19, 0, 0),
datetime.datetime.now().time(),
)
with DAG(
dag_id="with_short_circuit_twice_a_day",
schedule_interval='@hourly',
start_date=datetime.datetime(2021, 7, 17),
catchup=False
) as dag:
first_op = DummyOperator(task_id='importdata') # Replace with your operator
short_op = ShortCircuitOperator(task_id='short_circuit',
python_callable=shortcircuit_fn
)
send_op = DummyOperator(task_id='sendReport') # Replace with your operator
first_op >> short_op >> Label("8<time<9 or 18<time<19") >> send_op
Executing outside of time window:在时间窗口外执行:
Executing in time window:在时间窗口中执行:
Previous Answer: This is relevant only if you want to send the report for all runs between 8AM
and 6PM
上一个答案:仅当您要发送所有在
8AM
到6PM
8AM
之间运行的报告时,这才是相关的
There are two way to get this functionality.有两种方法可以获得此功能。 You can use
BranchDateTimeOperator
to check if the job runs within the desired timeframe as:您可以使用
BranchDateTimeOperator
检查作业是否在所需的时间范围内运行,如下所示:
import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.datetime import BranchDateTimeOperator
from airflow.utils.edgemodifier import Label
with DAG(
dag_id="with_branching",
schedule_interval='@hourly',
start_date=datetime.datetime(2021, 7, 17),
catchup=False
) as dag:
first_op = DummyOperator(task_id='importdata') # Replace with your operator
branch_op = BranchDateTimeOperator(
task_id='branch',
follow_task_ids_if_true='sendReport',
follow_task_ids_if_false='do_nothing',
target_upper=datetime.time(18, 0, 0),
target_lower=datetime.time(8, 0, 0),
)
in_range_op = DummyOperator(task_id='sendReport') # Replace with your operator
out_of_range_op = DummyOperator(task_id='do_nothing')
first_op >> branch_op >> Label("8<time<18") >> in_range_op
branch_op >> Label("rest of day") >> out_of_range_op
Executing outside of time window:在时间窗口外执行:
Executing in time window:在时间窗口中执行:
This is a good solution if you need to actually do diffrent tasks in each branch of the workflow.如果您需要在工作流的每个分支中实际执行不同的任务,这是一个很好的解决方案。 If this is not the case then you probably should use the the 2nd option of
ShortCircuitOperator
with that solution the workflow will continue to sendReport
only if the time criteria is met if the criteria isn't met it will skip:如果不是这种情况,那么您可能应该在该解决方案中使用
ShortCircuitOperator
的第二个选项,只有在满足时间标准的情况下,工作流才会继续发送sendReport
如果不满足标准,它将跳过:
import datetime
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import ShortCircuitOperator
from airflow.utils.edgemodifier import Label
def time_in_range(start, end, x):
"""Return true if x is in the range [start, end]"""
if start <= end:
return start <= x <= end
else:
return start <= x or x <= end
def shortcircuit_fn():
return time_in_range(datetime.time(8, 0, 0),
datetime.time(18, 0, 0),
datetime.datetime.now().time(),
)
with DAG(
dag_id="with_short_circuit",
schedule_interval='@hourly',
start_date=datetime.datetime(2021, 7, 17),
catchup=False
) as dag:
first_op = DummyOperator(task_id='importdata') # Replace with your operator
short_op = ShortCircuitOperator(task_id='short_circuit',
python_callable=shortcircuit_fn
)
send_op = DummyOperator(task_id='sendReport') # Replace with your operator
first_op >> short_op >> Label("8<time<18") >> send_op
Executing outside of time window:在时间窗口外执行:
Executing in time window:在时间窗口中执行:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.