简体   繁体   English

Google Cloud Composer 单调度程序与多调度程序

[英]Google Cloud Composer single vs multiple scheduler

Background: I have a composer environment which has 40 dags, out of which there are few dags which are making API calls for downloading the files and there are some which are moving files from gcs bucket to gcs bucket and some dags move files from gcs bucket to BigQuery.背景:我有一个有 40 个 dag 的 composer 环境,其中很少有 dag 发出 API 调用以下载文件,还有一些正在将文件从 gcs 存储桶移动到 gcs 存储桶,还有一些 dag 将文件从 gcs 存储桶移动到 BigQuery。

Scheduler configs:调度器配置:

Number of Schedulers:1, CPU: 1 vCPUs, Memory: 1.75 GB, Storage: 1 GB调度器数量:1,CPU:1 个 vCPU,Memory:1.75 GB,存储空间:1 GB

It used all the 100% of cpu and dag parsig time was more then a 60 seconds它使用了所有 100% 的 cpu 和 dag parsig 时间超过 60 秒

According to the documentation CPU should not exceed more then 80% and DAG parsing time should not be more then 10 seconds.根据文档,CPU 不应超过 80%,DAG 解析时间不应超过 10 秒。 So I did some test with the configs.所以我用配置做了一些测试。

Test 1: Added one more scheduler with same 1 vCPUs, 1 GB memory and 1GB storage, now because of two scheduler we have 2 vCPUs, 2 GB memory and 2 GB of storage For some reason it sill occupied all the 100% of scheduler cpu and dag parsing time was fell to 10-20 seconds.测试 1:再添加一个具有相同 1 个 vCPU、1 GB memory 和 1GB 存储空间的调度程序,现在因为有两个调度程序,我们有 2 个 vCPU、2 GB memory 和 2 GB 存储空间 由于某种原因,它仍然占用了所有 100% 的调度程序 CPU并且 dag 解析时间下降到 10-20 秒。

Test 2: With the 1 scheduler I increased cpu to 1.75 vCPUs, memory was 1.75 GB and storage to 2 GB.测试 2:使用 1 个调度程序,我将 CPU 增加到 1.75 个 vCPU,memory 为 1.75 GB,存储增加到 2 GB。 For some reason it used the average cpu to 1.2 less then 80% of usage in cpu and DAG parsing time fell to less then 4 seconds.由于某种原因,它使用平均 cpu 到 1.2 不到 80% 的 cpu 使用率,DAG 解析时间下降到不到 4 秒。

I am not able to understand the actual reason isn't having two schedulers should be faster?我无法理解没有两个调度程序应该更快的实际原因? is there something which I am missing?有什么我想念的吗?

Airflow scheduler is used to monitor the tasks and DAGs and trigger the task instance once their dependencies are complete. Airflow 调度程序用于监视任务和 DAG,并在它们的依赖项完成后触发任务实例。 Multiple schedulers can be added to distribute load but that does not improve the performance of the Airflow. One schedule might give better performance than multiple schedulers which may happen due to underutilization of the scheduler which results in consumption of resources but not improving performance or contributing to performance.可以添加多个调度程序来分配负载,但这不会提高 Airflow 的性能。一个调度程序可能会比多个调度程序提供更好的性能,这可能是由于调度程序未充分利用导致资源消耗但不会提高性能或有助于表现。

The performance of the scheduler depends on the number of Airflow workers, the number of DAGs and tasks that run in your environment, and the configuration of both Airflow and the environment.调度程序的性能取决于 Airflow worker 的数量、在您的环境中运行的 DAG 和任务的数量以及 Airflow 和环境的配置。 It's recommended to start with two schedulers and monitor the performance and scale according to your requirement.建议从两个调度程序开始,并根据您的要求监控性能和规模。 You can check this documentation for more information.您可以查看此文档以获取更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM