简体   繁体   中英

How much can Airflow scale?

Has anyone reported how much they've been able to get Airflow to scale at their company? I'm looking at implementing Airflow to execute 5,000+ tasks that will each run hourly, and someday scale that up to 20,000+ tasks. In examining the scheduler it looks like that might be a bottleneck since only one instance of it can run, and I'm concerned with that many tasks the scheduler will struggle to keep up. Should I be?

We run thousands of tasks a day at my company and have been using Airflow for the better part of 2 years. These dags run every 15 minutes and are generated through config files that can change at any time (fed in from a UI).

The short answer - yes, it can definitely scale to that, depending on your infrastructure. Some of the new 1.10 features should make this easier than the version of 1.8 we run that runs all those tasks. We ran this on a large Mesos/DCOS that took a good deal of fine tuning to get to a stable point.

The long answer - although it can scale to that, we've found that a better solution is multiple Airflow instances with different configurations (scheduler settings,number of workers, etc.) optimized for the types dags they are running. A set of DAGs that run long running machine learning jobs should be hosted on an Airflow instance that is different from the ones running 5 minute ETL jobs. This also makes it easier for different teams to maintain the jobs they are responsible for and makes it easier to iterate on any fine tuning that's needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM