简体   繁体   English

气流trigger_dag execution_date是第二天,为什么?

[英]airflow trigger_dag execution_date is the next day, why?

Recently I have tested airflow so much that have one problem with execution_date when running airflow trigger_dag <my-dag> . 最近我测试了气流这么多,当运行airflow trigger_dag <my-dag>时, execution_date有一个问题。

I have learned that execution_date is not what we think at first time from here : 我了解到execution_date不是我们第一次从这里想到的:

Airflow was developed as a solution for ETL needs. Airflow是作为ETL需求的解决方案而开发的。 In the ETL world, you typically summarize data. 在ETL世界中,您通常会汇总数据。 So, if I want to summarize data for 2016-02-19, I would do it at 2016-02-20 midnight GMT, which would be right after all data for 2016-02-19 becomes available. 所以,如果我想总结2016-02-19的数据,我会在格林威治标准时间2016-02-20午夜进行,这将在2016-02-19的所有数据可用之后。

start_date = datetime.combine(datetime.today(),
                              datetime.min.time())

args = {
    "owner": "xigua",
    "start_date": start_date
}
dag = DAG(dag_id="hadoopprojects", default_args=args,
          schedule_interval=timedelta(days=1))


wait_5m = ops.TimeDeltaSensor(task_id="wait_5m",
                              dag=dag,
                              delta=timedelta(minutes=5))

Above codes is the start part of my daily workflow, the first task is a TimeDeltaSensor that waits another 5 minutes before actual work, so this means my dag will be triggered at 2016-09-09T00:05:00 , 2016-09-10T00:05:00 ... etc. 上面的代码是我日常工作的一部分开始,第一项任务就是等待前实际工作中的另一个5分钟TimeDeltaSensor,所以这意味着我的DAG将在触发2016-09-09T00:05:002016-09-10T00:05:00 ......等

In Web UI, I can see something like scheduled__2016-09-20T00:00:00 , and task is run at 2016-09-21T00:00:00 , which seems reasonable according to ETL model. 在Web UI中,我可以看到scheduled__2016-09-20T00:00:00 ,任务在2016-09-21T00:00:00运行,根据ETL模型,这似乎是合理的。

However someday my dag is not triggered for unknown reason, so I trigger it manually, if I trigger it at 2016-09-20T00:10:00 , then the TimeDeltaSensor will wait until 2016-09-21T00:15:00 before run. 但是有一天我的dag不会因为未知原因被触发,所以我手动触发它,如果我在2016-09-20T00:10:00触发它,那么TimeDeltaSensor将等到2016-09-21T00:15:00才能运行。

This is not what I want, I want it to run at 2016-09-20T00:15:00 not the next day, I have tried passing execution_date through --conf '{"execution_date": "2016-09-20"}' , but it doesn't work. 这不是我想要的,我希望它在2016-09-20T00:15:00而不是第二天,我尝试通过--conf '{"execution_date": "2016-09-20"}'传递execution_date --conf '{"execution_date": "2016-09-20"}' ,但它不起作用。

How should I deal with this issue ? 我该如何处理这个问题?

$ airflow version
[2016-09-21 17:26:33,654] {__init__.py:36} INFO - Using executor LocalExecutor
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   v1.7.1.3

First, I recommend you use constants for start_date , because dynamic ones would act unpredictably based on with your airflow pipeline is evaluated by the scheduler. 首先,我建议你为start_date使用常量,因为动态的会根据你的气流管道进行不可预测的行为由调度程序进行评估。

More information about start_date here in an FAQ entry that I wrote and sort all this out: https://airflow.apache.org/faq.html#what-s-the-deal-with-start-date 有关start_date更多信息,请参阅我编写的FAQ条目并对其进行排序: https//airflow.apache.org/faq.html#what-s-the-deal-with-start-date

Now, about execution_date and when it is triggered, this is a common gotcha for people onboarding on Airflow. 现在,关于execution_date以及何时触发,这是人们在Airflow上登机的常见问题。 Airflow sets execution_date based on the left bound of the schedule period it is covering, not based on when it fires (which would be the right bound of the period). Airflow根据它所覆盖的调度周期的左边界设置execution_date ,而不是基于它何时触发(这将是该周期的右边界)。 When running an schedule='@hourly' task for instance, a task will fire every hour. 例如,当运行schedule='@hourly'任务时,任务将每小时触发一次。 The task that fires at 2pm will have an execution_date of 1pm because it assumes that you are processing the 1pm to 2pm time window at 2pm. 在下午2点触发的任务将具有下午1点的execution_date ,因为它假设您正在下午2点处理下午1点到下午2点的时间窗口。 Similarly, if you run a daily job, the run an with execution_date of 2016-01-01 would trigger soon after midnight on 2016-01-02 . 同样,如果您运行每日作业,则2016-01-01 execution_date运行将在2016-01-02午夜后不久触发。

This left-bound labelling makes a lot of sense when thinking in terms of ETL and differential loads, but gets confusing when thinking in terms of a simple, cron-like scheduler. 在考虑ETL和差异负载时,这种左边标记很有意义,但在考虑简单的类似cron的调度程序时会让人感到困惑。

Airflow will provide the time in UTC. 气流将提供UTC时间。 I am not sure at what timezone you are running the tasks. 我不确定你在什么时区运行任务。 So make sure you think of UTC timezone and schedule or trigger the jobs accordingly. 因此,请确保您考虑UTC时区并相应地安排或触发作业。

Try converting the time you want to trigger to UTC time and trigger the DAG. 尝试将您想要触发的时间转换为UTC时间并触发DAG。 it works. 有用。 For more information, you can read the below link 有关更多信息,请阅读以下链接

https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 气流 dag 在追赶期间开始超过开始日期后 1 天的“execution_date” - Airflow dag starting passing "execution_date" of 1 day after start date during catchup “execution_date”在 airflow dag 中超过前一天的日期(追赶模式为假) - "execution_date" is passing previous day's date in airflow dag (catch up mode is false) 气流 CLI Trigger_dag 抑制警告 - Airflow CLI Trigger_dag suppress warning 了解 Airflow 中的 execution_date - Understanding execution_date in Airflow 为什么重新启动气流后,trigger_dag无法从失败的任务开始? - Why trigger_dag not starting from failed task after restart in airflow? 如何使 airflow 调度在 start_date 而不是 execution_date 上触发(使 execution_date 等于 start_date)? - How to make airflow scheduling trigger on start_date instead of execution_date (make execution_date equal to start_date)? Airflow execution_date 值错误 - Airflow execution_date wrong value 如何使用 Airflow 从 Context[&quot;execution_date&quot;] 和 DR.exuection_date 中仅提取一天 - How to extract only day from Context["execution_date"] and DR.exuection_date using Airflow 在 op_kwargs 中带有 execution_date 的 Airflow Python 脚本 - Airflow Python Script with execution_date in op_kwargs Airflow:使用 execution_date 将一位数转换为两位数 - Airflow: Convert one digit to two digit with execution_date
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM