简体   繁体   中英

ETL present data without the schedule interval delay while not breaking the Catchup

I have a DAG that need to be trigger every Tuesday and Friday (for context, purpose of the DAG is basically ETL data published only twice a week on Tuesday and Friday)

This DAG need to Catchup the past.

I use the {{ execution_date }} in many operator parameter (for API call parameter, in storage name for keeping copy of raw data, ...)

The Catchup works well, my issue is with the present.

Because of the schedule interval, every Friday it will ETL the data of previous Tuesday (use execution_date for API call parameter) and every Tuesday it will ETL the data of previous Friday.

What I need is that the Tuesday run get data of this Tuesday and not the previous Friday.

I think about using start_date instead of execution_date for API call but in this case the Catchup will not work as expected.

I don't find any pretty solution where Catchup work well and present data are processed without the schedule interval delay...

Any idea ?

EDIT Based on andscoop answer:

Best solution is to use next_execution_date instead of execution_date

Catchup will not prevent the most current DAG from running. It only determines whether or not previous un-run DAGs will run to "catchup".

There is no delay per-se, what you are seeing is that the reported execution date is only showing the last completed schedule interval.

You will want to look into Airflow macros to template the exact timestamp you need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM