[英]How to make a DAG that needs to process the data from today?
I have a DAG that starts at 7:30pm every day. 我有一个DAG,每天7:30 pm开始。 It needs to process the files located in /data/yyyy-mm-dd/ directory. 它需要处理/ data / yyyy-mm-dd /目录中的文件。 yyyy-mm-dd is that same day. yyyy-mm-dd是同一天。
If I use execution_date + timedelta(day=1) it works when the DAG is ran by the scheduler. 如果我使用execution_date + timedelta(day = 1),则在调度程序运行DAG时可以使用。 But this breaks when I use the backfill command (I have to give it 2019-01-01 to run for 2019-01-02) 但这在我使用backfill命令时中断了(我必须给它2019-01-01才能运行2019-01-02)
Is there a better way to accomplish this? 有没有更好的方法可以做到这一点?
Your question sounds a little confused about the execution_date
for backfills. 您的问题听起来有点困惑关于补余的execution_date
。 The backfill command asks you to specify the alternate start and end dates to run the DAG in. It then uses the schedule_interval
to figure out runs that would have run in that range and passes them their execution_date
. backfill命令要求您指定运行DAG的备用开始日期和结束日期。然后,它使用schedule_interval
找出在该范围内将要运行的运行,并将其execution_date
传递给他们。
So, your schedule_interval
probably looks like 30 19 * * *
. 因此,您的schedule_interval
可能看起来像30 19 * * *
。 And as you know your run is passed the start of the interval at the closing of that interval, so a scheduled execution_date
of 2019-01-01T19:30:00.000 will be triggered to start after 2019-01-02T19:30:00.000. 如您所知,您的跑步在该时间间隔结束时通过了该时间间隔的开始,因此预定触发的execution_date
日期2019-01-01T19:30:00.000将在2019-01-02T19:30:00.000之后触发。 It seems at that time you want the job to pick up data that landed in /data/2019-01-02/
which is why you're adding a day to the execution_date
and formatting it for the source. 似乎当时您想让工作来拾取/data/2019-01-02/
中着陆的数据,这就是为什么要在execution_date
添加一天并将其格式化为源的原因。
If you're backfill ing, it should behave the same way (rather than shifting time around). 如果您要回填 ,则回填的行为应相同(而不是转移时间)。 So given -s 2019-01-01 -e 2019-01-02
it's going to backfill a run that would have been triggered after 2019-01-02T19:30:00.000 with the execution date of 2019-01-01T19:30:00.000 isn't it? 因此,鉴于-s 2019-01-01 -e 2019-01-02
它将回填在2019-01-02T19:30:00.000之后,执行日期为2019-01-01T19:30之后触发的运行: 00.000是吗?
As for other ways to do this: 至于其他方式可以做到这一点:
execution_date
. 您可以将您的移动运行到午夜,并让他们使用日期在execution_date
。 But 4.5h delay is probably not what you had in mind. 但是4.5小时的延迟可能不是您所想的。 next_execution_date
, which is basically going to give you the same result as adding a day to the execution_date
. 气流也有next_execution_date
,这基本上是想给你同样的结果将每天的execution_date
。 But you might like the formatted macro {{ next_ds }}
for your needs. 但是您可能需要格式化的宏 {{ next_ds }}
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.