简体   繁体   English

GCP Dataflow 中如何确定永久性磁盘的使用情况?

[英]How is persistent disk use determined in GCP Dataflow?

In the pricing section, Google says that there is a default amount of PD per worker (varies depending on batch vs streaming).在定价部分,谷歌表示每个工人有一个默认的 PD 数量(因批处理和流处理而异)。 I am running a job, and the amount of persistent disk use is much higher than it should be, given the number of workers that I have (compared to the default PD use).我正在运行一个作业,考虑到我拥有的工作人员数量(与默认 PD 使用相比),持久性磁盘的使用量远高于应有的数量。 This is consistent across multiple distinct jobs.这在多个不同的工作中是一致的。 What is causing the increased PD use?是什么导致 PD 使用增加? For reference, the default is 480 GB for a streaming worker, but I am getting charged for 5888 GB.作为参考,流媒体工作人员的默认值为 480 GB,但我需要支付 5888 GB 的费用。

Update as of 2021截至 2021 年的更新

Dataflow now has Streaming Engine - streaming engine does NOT rely on persistent disks to hold state for streaming jobs - instead it provides a 'service' that abstracts streaming state/snapshot storage. Dataflow 现在有 Streaming Engine - 流引擎不依赖永久磁盘来保存流作业的状态 - 相反它提供了一个抽象流状态/快照存储的“服务”。

If Disk billing is a concern in your streaming pipelines, consider using streaming engine.如果磁盘计费是流媒体管道中的一个问题,请考虑使用流媒体引擎。

See more information: https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#streaming-engine查看更多信息: https ://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#streaming-engine


This is a streaming pipeline with autoscaling enabled.这是一个启用了自动缩放的流式管道。

According to https://cloud.google.com/dataflow/service/dataflow-service-desc#autoscaling :根据https://cloud.google.com/dataflow/service/dataflow-service-desc#autoscaling

Streaming pipelines are deployed with a fixed pool of persistent disks, equal in number to --maxNumWorkers流式管道部署有固定的永久性磁盘池,数量等于 --maxNumWorkers

According to https://cloud.google.com/dataflow/service/dataflow-service-desc#persistent-disk-resources :根据https://cloud.google.com/dataflow/service/dataflow-service-desc#persistent-disk-resources

The default size of each persistent disk is 250 GB in batch mode and 400 GB in streaming mode.每个永久性磁盘的默认大小在批处理模式下为 250 GB,在流式模式下为 400 GB。

So the expected value of "Current PD" should be around (your value of maxNumWorkers ) * 400GB, rather than 4 * 400GB.因此,“当前 PD”的预期值应该在(您的maxNumWorkers值)* 400GB 左右,而不是 4 * 400GB。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM