简体   繁体   English

气流计划时间表间隔未触发DAG

[英]Airflow cron schedule interval not triggering DAG

Note: The crontab.guru links were breaking so I wrapped them in code blocks. 注意:crontab.guru链接断开,因此我将它们包装在代码块中。

I have a DAG that is to be executed on Mondays at midnight pacific time, 8am UTC, bumped by 1 minute to avoid any overlap issues. 我有一个DAG,该DAG将在太平洋时间星期一午夜(太平洋标准时间)上午8点执行,为了避免任何重叠问题,碰撞了1分钟。

Originally the schedule interval was set as 1 8 */1 * 1 which according to https://crontab.guru/#1_8_*/1_*_1 is "At 08:01 UTC (03:01 EST, 00:01 PST) on every day-of-month if it's on Monday". 最初,计划时间间隔设置为1 8 */1 * 1 ,根据https://crontab.guru/#1_8_*/1_*_1该时间间隔是“在世界标准时间08:01(美国东部标准时间03:01,太平洋标准时间00:01)如果是星期一,则在每月的每一天”。

However, this caused the DAG to trigger every day at 08:01 UTC; 但是,这导致DAG每天在世界标准时间08:01触发; the Monday condition seemed to be ignored. 周一的情况似乎被忽略了。

The schedule interval was updated to the simpler 1 8 * * 1 , which according to https://crontab.guru/#1_8_*_*_1 is "At 08:01 UTC (03:01 EST, 00:01 PST) on Monday". 计划时间间隔已更新为更简单的1 8 * * 1 ,根据https://crontab.guru/#1_8_*_*_1 ,该时间间隔是“在08:01 UTC(美国东部标准时间03:01,00:01 PST)星期一”。

This stopped the DAG from executing every day, but it did not trigger on 2019-02-18, the first Monday following the update. 这使DAG每天都无法执行,但它没有在更新之后的第一个星期一的2019-02-18触发。 I've read some other posts that indicate that the start date might cause this issue, but this task's start date is datetime(2019, 2, 11, 0, 0, 0, 0, pytz.UTC) , which is two intervals before the 2019-02-18 run date. 我读过一些其他文章,指出开始日期可能会导致此问题,但是此任务的开始日期是datetime(2019, 2, 11, 0, 0, 0, 0, pytz.UTC) ,这是之前的两个间隔运行日期为2019-02-18。

Here is the complete DAG/task definition (without imports or specific names): 这是完整的DAG /任务定义(没有导入或特定名称):

dag = DAG(
    dag_id="dag",
    description="dag",
    # At 08:01 UTC (03:01 EST, 00:01 PST) on Monday
    # (https://crontab.guru/#1_8_*_*_1)
    schedule_interval="1 8 * * 1",
    catchup=False,
)


task = PythonOperator(
    task_id="handle",
    provide_context=True,
    python_callable=handle,
    dag=dag,
    retries=2,
    retry_delay=timedelta(minutes=15),
    start_date=datetime(2019, 2, 11, 0, 0, 0, 0, pytz.UTC),
)

Any idea why this wouldn't have executed after the 2019-02-18 00:01 UTC interval? 知道为什么在2019-02-18 00:01 UTC时间间隔之后无法执行此操作吗?

EDIT: The reason you do not see the run execute on the 18th is you have catchup=False 编辑:您看不到运行在18号执行的原因是您有catchup=False

This will cause the DAG to skip backfill days if they have already passed. 如果已经过去,这将导致DAG跳过回填天数。 If you want to see the DAG fill in the 17th and the 24th you would need to set catchup=True 如果要在第17和24号中看到DAG,则需要设置catchup=True

Airflow DAGs execute at the END of the Schedule Interval , so if your start date is the current Monday and your interval is every Monday, the DAG will not execute for this Monday's run until the following Monday. 气流DAG在计划间隔 结束时执行,因此,如果您的开始日期是当前星期一,而您的间隔是每个星期一,则DAG直到下一个星期一才会在该星期一的运行中执行。

The main idea here is the data for the current Monday run is now available until the end of that interval period. 这里的主要思想是,在该间隔时间段结束之前,可以使用当前星期一运行的数据。 This makes more sense of you think about it in terms of daily jobs. 从日常工作的角度来看,这更有意义。 If you are running a job that is looking for today's data, that data set will not be complete until the end of today. 如果您正在运行的工作正在寻找今天的数据,则该数据集要到今天年底才能完成。 So if you want to run the data for today, you need to execute your job tomorrow. 因此,如果要今天运行数据,则明天需要执行作业。 This is just a convention that Airflow has adopted, like it or not. 不管您喜欢与否,这只是Airflow采纳的一项约定。

If you would like to adjust the dates, you can use {{ macros.ds_add( ds, 7) }} to shift the execution date by 7 days. 如果您想调整日期,可以使用{{ macros.ds_add( ds, 7) }}将执行日期移动7天。

Let me know if this answer makes sense. 让我知道这个答案是否有意义。 If not I will expand on it. 如果没有的话,我会继续讨论。 This convention has been the most nagging detail we have had to deal with while developing for Airflow jobs. 这个约定是我们在开发Airflow工作时必须处理的最琐碎的细节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM