简体   繁体   English

切换到 ON 时,气流 dag 会自动触发 DAG?

[英]Airflow dag automatically triggering DAG when toggle to ON?

I have created a DAG with schedule interval "*/10 * * * *" .我创建了一个调度间隔为"*/10 * * * *"的 DAG。 But when I toggle DAG to ON through Airflow UI (at the time 01/07/2020 07:50:00 ).但是当我通过 Airflow UI 将 DAG 切换为 ON 时(在01/07/2020 07:50:00 时)。 Its automatically getting triggered without waiting to complete the specified interval.它会自动触发,无需等待完成指定的时间间隔。

#Specified start date
start_date : datetime.strptime('01/07/2020 06:35:00', '%m/%d/%Y %H:%M:%S')

I have tried by adding 'catchup': False to dag_args still facing the same problem我已经尝试通过添加'catchup': False to dag_args 仍然面临同样的问题

It's due to incorrect START_DATE you provided.这是由于您提供的 START_DATE 不正确。 Per the cron expression "*/10 * * * *", your dag will trigger every 10 minutes.根据 cron 表达式“*/10 * * * *”,您的 dag 将每 10 分钟触发一次。 When you unpaused the dag at the time 01/07/2020 07:50:00, run id 01/07/2020 07:40:00 got triggered immediately.当您在 01/07/2020 07:50:00 取消暂停 dag 时,会立即触发运行 ID 01/07/2020 07:40:00。

First, I recommend you use constants for start_date, because dynamic ones would act unpredictably based on with your airflow pipeline is evaluated by the scheduler.首先,我建议您对 start_date 使用常量,因为动态的会根据调度程序评估的气流管道而无法预测。

More information about start_date here in an FAQ entry that I wrote and sort all this out: https://airflow.apache.org/faq.html#what-s-the-deal-with-start-date在我编写的常见问题解答条目中有关 start_date 的更多信息并将所有这些整理出来: https : //airflow.apache.org/faq.html#what-s-the-deal-with-start-date

Now, about execution_date and when it is triggered, this is a common gotcha for people onboarding on Airflow.现在,关于 execution_date 以及何时触发,这是人们在 Airflow 上入门的常见问题。 Airflow sets execution_date based on the left bound of the schedule period it is covering, not based on when it fires (which would be the right bound of the period). Airflow 根据它所覆盖的调度周期的左边界设置 execution_date,而不是基于它何时触发(这将是该周期的右边界)。 When running an schedule='@hourly' task for instance, a task will fire every hour.例如,当运行 schedule='@hourly' 任务时,任务将每小时触发一次。 The task that fires at 2pm will have an execution_date of 1pm because it assumes that you are processing the 1pm to 2pm time window at 2pm.在 2pm 触发的任务的 execution_date 为 1pm,因为它假设您正在 2pm 处理 1pm 到 2pm 的时间窗口。 Similarly, if you run a daily job, the run an with execution_date of 2016-01-01 would trigger soon after midnight on 2016-01-02.同样,如果您运行每日作业,则执行日期为 2016-01-01 的运行将在 2016-01-02 午夜后不久触发。

This left-bound labelling makes a lot of sense when thinking in terms of ETL and differential loads, but gets confusing when thinking in terms of a simple, cron-like scheduler.在考虑 ETL 和差异负载时,这种左边界标记很有意义,但在考虑简单的、类似 cron 的调度程序时会让人感到困惑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM