简体   繁体   English

ETL呈现数据而没有调度间隔延迟,同时不中断追赶

[英]ETL present data without the schedule interval delay while not breaking the Catchup

I have a DAG that need to be trigger every Tuesday and Friday (for context, purpose of the DAG is basically ETL data published only twice a week on Tuesday and Friday) 我有一个DAG,它需要在每个星期二和星期五触发(就上下文而言,DAG的目的基本上是每周仅在星期二和星期五发布两次的ETL数据)

This DAG need to Catchup the past. 这个DAG需要赶上过去。

I use the {{ execution_date }} in many operator parameter (for API call parameter, in storage name for keeping copy of raw data, ...) 我在许多运算子参数中使用了{{execution_date}}(对于API调用参数,在存储名称中用于保留原始数据的副本,...)

The Catchup works well, my issue is with the present. 赶超效果很好,我的问题是现在。

Because of the schedule interval, every Friday it will ETL the data of previous Tuesday (use execution_date for API call parameter) and every Tuesday it will ETL the data of previous Friday. 由于计划间隔,每个星期五将ETL上一个星期二的数据(使用execute_date作为API调用参数),每个星期二将ETL前面一个星期五的数据。

What I need is that the Tuesday run get data of this Tuesday and not the previous Friday. 我需要的是星期二的运行获取本星期二而不是上一个星期五的数据。

I think about using start_date instead of execution_date for API call but in this case the Catchup will not work as expected. 我考虑使用start_date而不是execute_date进行API调用,但是在这种情况下,Catchup将无法按预期工作。

I don't find any pretty solution where Catchup work well and present data are processed without the schedule interval delay... 我没有找到任何很好的解决方案,在这些解决方案中,Catchup可以很好地工作,并且可以处理当前数据而不会延迟计划间隔...

Any idea ? 任何想法 ?

EDIT Based on andscoop answer: 编辑基于andscoop答案:

Best solution is to use next_execution_date instead of execution_date 最好的解决方案是使用next_execution_date而不是execution_date

Catchup will not prevent the most current DAG from running. 追赶不会阻止最新的DAG运行。 It only determines whether or not previous un-run DAGs will run to "catchup". 它仅确定先前未运行的DAG是否将运行到“追赶”状态。

There is no delay per-se, what you are seeing is that the reported execution date is only showing the last completed schedule interval. 暂时没有延迟,您看到的是报告的执行日期仅显示最后完成的计划间隔。

You will want to look into Airflow macros to template the exact timestamp you need. 您将需要研究Airflow 以模板化所需的确切时间戳。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM