简体   繁体   English

在 Airflow 中每小时运行 dag

[英]Hourly run dag in Airflow

My Dag我的狗

{
    'owner': 'airflow',
    'start_date': datetime(2020, 1, 10, 7, 1, 00),
    'depends_on_past': False,
    'catchup_by_default': False,
}

dag = DAG('Hourly_test_2', schedule_interval='0 * * * *', default_args=default_args)

It runs every hour, but it shows 1 hour less in tree view graph.它每小时运行一次,但在树形视图中显示少 1 小时。 Example in tree view graph time show 8AM but the actual time is 9 AM.树形视图中的示例时间显示为上午 8 点,但实际时间为上午 9 点。 How to sync both times?如何同步两次?

Job should run every hour and hour should match with a current hour in the Tree view.作业应该每小时运行一次,小时应该与树视图中的当前小时匹配。

在此处输入图片说明

It is not a time synchronization problem, it is due to the start_date and schedule_interval, airflow by default calculates how many times it should have been executed from start_date until the current date and start a DAG Run for any interval that has not been executed check here .这不是时间同步问题,这是由于 start_date 和 schedule_interval,airflow 默认计算从 start_date 到当前日期应该执行多少次,并在任何尚未执行的时间间隔内启动 DAG Run 检查这里.

In your case the start date is 7:01 and according to your schedule_interval the execution intervals are 8:00, 9:00, 10:00 ...在您的情况下,开始日期为 7:01,根据您的 schedule_interval,执行间隔为 8:00、9:00、10:00 ...

This is why there is a DAG Run at 8:00, you can disable this behavior by default by setting the parameter catchup = False in your dag definition.这就是为什么在 8:00 有 DAG 运行的原因,默认情况下,您可以通过在 dag 定义中设置参数 catchup = False 来禁用此行为。

dag = DAG('Hourly_test_2', catchup=False, schedule_interval='0 * * * *', default_args=default_args)

This is how airflow schedules.这就是气流调度的方式。 Check this part of the scheduler documentation.检查调度程序文档的这一部分。

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59.请注意,如果您在一天的 schedule_interval 上运行 DAG,标记为 2016-01-01 的运行将在 2016-01-01T23:59 之后不久触发。 In other words, the job instance is started once the period it covers has ended.换句话说,一旦它涵盖的时间段结束,就会启动作业实例。

Let's Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.让我们重复一遍 调度程序在开始日期之后的一个 schedule_interval 时间段运行您的作业,在周期结束时。

Ref: https://airflow.apache.org/docs/stable/scheduler.html参考: https : //airflow.apache.org/docs/stable/scheduler.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM