[英]GCP Dataflow Batch jobs - Preventing workers from running more than one element at a time in a batch job
I am trying to run a batch job in GCP dataflow.我正在尝试在 GCP 数据流中运行批处理作业。 The job itself is very memory intensive at times.
工作本身有时非常密集 memory。 At the moment the job keeps crashing, as I believe each worker is trying to run multiple elements of the pcollection at the same time.
目前工作一直在崩溃,因为我相信每个工作人员都试图同时运行 pcollection 的多个元素。 Is there a way to prevent each worker from running more than one element at a time?
有没有办法防止每个工人一次运行多个元素?
The principle of Beam is to write a processing description and to let the runtime environment (here dataflow) running it and distributing it automatically. Beam的原理是写一个处理描述,让运行时环境(这里是dataflow)自动运行并分发。 You can't control what it is doing under the hood.
你无法控制它在幕后所做的事情。
However you can try different things但是你可以尝试不同的东西
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.