简体   繁体   中英

AirFlow DAG running twice after DST

I am updating the scheduler of my DAG on running time with a logic like this:

now = time.localtime()
sched_interval = '30 6 * * *' if now.tm_isdst else '30 7 * * *'

dag = DAG(
    'my_dag',
    default_args=args,
    schedule_interval=sched_interval,
    max_active_runs=1,
    catchup=False)

The problem is: after DST, DAG will trigger twice since the scheduler will be updated for 1h more. How can I avoid running twice in this case? I am using AirFlow 1.9.

Thanks!

The Airflow documentation says:

In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will then ignore day light savings time. Thus, if you have a schedule that says run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1, regardless if day light savings time is in place.

This seems to imply to me that you don't need to test for DST as it will automatically convert.

Airflow 1.9 does not provide a functionality to account for the daylight saving time. It knows nothing about time zones and runs everything in UTC±00:00.

As you found out, changing schedule interval trying to emulate this missing functionality is problematic, because

Changing schedule interval always requires changing the dag_id, because previously run TaskInstances will not align with the new schedule interval [ 1 ]

So, if possible, the best solution would be to upgrade to at least Airflow 1.10 that introduces timezone-aware DAGs . Then you can achieve what you want by setting the timezone of your DAG as needed and using a crone expression for schedule interval.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM