繁体 English 中英

了解 Google Cloud DataFlow Worker 中的线程

[英]Understanding Threading in Google Cloud DataFlow Workers

原文 2022-08-24 19:58:34 5 1 google-cloud-dataflow

我做了一个等待 60 秒的简单程序。 我有 300 个输入元素要处理。

线程数 - Batch - 1 和 Streaming - 300 每本文档https://cloud.google.com/dataflow/docs/resources/faq#beam-java-sdk

在流模式下 - 有 1 个工作人员和 300 个线程，考虑到产生工作人员等的开销，作业应该在 2 到 3 分钟内完成。我的理解是 300 个输入元素中的每一个将有 300 个线程，并且全部睡眠 60 秒和工作应该完成。 但是，这项工作需要更多时间才能完成。
同样，在具有 1 个工作器（1 个线程）和 300 个输入元素的批处理模式下，应该需要 300 分钟才能完成。

有人可以澄清这在工人层面是如何发生的吗？

1 个解决方案

启动和拆除工作虚拟机有相当大的开销，因此很难从像这样的简短实验中概括出来。 此外，没有 promise 将有给定数量的流或批处理工作人员，因为这是一个依赖于实现的参数，我可以随时为任何跑步者更改（实际上甚至可以动态选择）。

Google Cloud Dataflow 中的批处理与流式处理性能

[英]Batch vs Streaming Performance in Google Cloud Dataflow

Google Cloud Dataflow 和 Google Cloud Dataproc 有什么区别？

[英]What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?

在 Beam/Google Cloud Dataflow 上调试慢速 PyTorch GPU 推理管道

[英]Debugging a slow PyTorch GPU Inference Pipeline on Beam/Google Cloud Dataflow

访问 PCollectionView 的元素<list<foo> >: 谷歌云数据流/Apache Beam </list<foo>

[英]Access elements of PCollectionView<List<Foo>> : Google Cloud Dataflow/Apache Beam

使用谷歌云中的数据流从云存储中读取数百万个文件的问题

[英]Issue with reading millions of files from cloud storage using dataflow in Google cloud

Cloud Dataflow 中的失败作业：启用 Dataflow API

[英]Failed job in Cloud Dataflow: enable Dataflow API

使用 Google Cloud Dataflow flex 模板时，是否可以使用多命令 CLI 来运行作业？

[英]When using Google Cloud Dataflow flex templates, is it possible to use a multi-command CLI to run a job?

Google Cloud Dataflow：ModuleNotFoundError：运行集成测试时没有名为“main”的模块

[英]Google Cloud Dataflow: ModuleNotFoundError: No module named 'main' when running integration test

在 Google Cloud Dataflow 上运行的 Apache Beam 中禁用特定 class 的日志记录

[英]Disable logging from a specific class in Apache Beam running on Google Cloud Dataflow

Web 爬虫使用 Cloud Dataflow

[英]Web Crawler using Cloud Dataflow

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Google Cloud Dataflow 中的批处理与流式处理性能 Google Cloud Dataflow 和 Google Cloud Dataproc 有什么区别？在 Beam/Google Cloud Dataflow 上调试慢速 PyTorch GPU 推理管道访问 PCollectionView 的元素<list<foo> >: 谷歌云数据流/Apache Beam </list<foo> 使用谷歌云中的数据流从云存储中读取数百万个文件的问题 Cloud Dataflow 中的失败作业：启用 Dataflow API 使用 Google Cloud Dataflow flex 模板时，是否可以使用多命令 CLI 来运行作业？ Google Cloud Dataflow：ModuleNotFoundError：运行集成测试时没有名为“main”的模块在 Google Cloud Dataflow 上运行的 Apache Beam 中禁用特定 class 的日志记录 Web 爬虫使用 Cloud Dataflow

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM