简体   繁体   中英

How to operate between multiple airflow schedulers

I'm new to Airflow. I`m considering to construct multiple airflow schedulers (celeryexecutor).

But, I'm curious about multiple schedulers operation

  1. How does the multiple schedulers schedule for serialized dags in meta database? Is there any rules for them? Who gets dag with which rules?

  2. Is there any load balancing for multiple schedulers?

If you answer this questions, It'll be very helpful. Thanks...

Airflow does not provide a magic solution to synchronize the different schedulers, where there is no load balancing, but it does batch scheduling to allow all schedulers to work together to schedule runs and task instances.

Airflow scheduler is running in an infinite loop , in each scheduling loop, the scheduler takes care of creating dag runs for max_dagruns_to_create_per_loop dags (just creating dag runs in queued state), checking max_dagruns_per_loop_to_schedule dag runs if they can be scheduled (queued -> scheduled) starting by the runs with the smaller execution dates, and trying to schedule max_tis_per_query task instances (queued -> scheduled).

All this selected objects (dags, runs and tis) are locked in the DB by the scheduler, and they are not visible to the other, so the other schedulers do the same thing with other objects.

In the case of a small number of dags, dag runs or task instances, using big values for this 3 configurations may lead to scheduling being done by one of the schedulers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM