I am trying to run a batch job in GCP dataflow. The job itself is very memory intensive at times. At the moment the job keeps crashing, as I believe each worker is trying to run multiple elements of the pcollection at the same time. Is there a way to prevent each worker from running more than one element at a time?
The principle of Beam is to write a processing description and to let the runtime environment (here dataflow) running it and distributing it automatically. You can't control what it is doing under the hood.
However you can try different things
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.