简体   繁体   English

自定义Google数据流选项

[英]Custom Google Dataflow Options

Due to very small traffic expected, dataflow with minimum resources is needed. 由于预期的流量非常小,因此需要使用最少资源的数据流。 The values needed are: 1 vCPU , 1 GB Memory and 30 GB Storage - Standard Persistent Disk . 所需的值为:1个vCPU ,1 GB Memory和30 GB Storage - Standard Persistent Disk

How can one create such a dataflow? 一个人如何创建这样的数据流? What i have so far is the following: 我到目前为止有以下内容:

    DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
    options.setProject("project-id");
    options.setRunner(DataflowRunner.class);
    //Begin: Autoscalling --disable
    options.setAutoscalingAlgorithm(DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType.NONE);
    options.setNumWorkers(1);
    //End: Autoscalling
    options.setStreaming(true);
    options.setAppName("");
    options.setMaxNumWorkers(1);

Where can one specify resources like vCPU , Memory and Storage - Standard Persistent Disk in dataflow options? 在数据流选项中,哪里可以指定vCPUMemoryStorage - Standard Persistent Disk等资源?

Update 更新

I'm new to GCP , any criticism is accepted 我是GCP新手,任何批评都可以接受

From the Javadocs Javadocs

setDiskSizeGb setDiskSizeGb

Remote worker disk size, in gigabytes, or 0 to use the default size. 远程工作磁盘大小(以千兆字节为单位),或0以使用默认大小。

And ... 还有...

setWorkerMachineType setWorkerMachineType

Machine type to create Dataflow worker VMs as. 用于创建Dataflow Worker VM的机器类型。

See GCE machine types for a list of valid options. 有关有效选项的列表,请参见GCE机器类型

If unset, the Dataflow service will choose a reasonable default. 如果未设置,则Dataflow服务将选择一个合理的默认值。

The allowed machine types are listed here , for your needs ("1vCPU, 1GB Memory") this one is the closest match: n1-standard-1 . 此处列出允许的机器类型,根据您的需要(“ 1vCPU,1GB内存”),这是最接近的匹配项: n1-standard-1

So, if you invoke the following methods on DataflowPipelineOptions ... 因此,如果在DataflowPipelineOptions上调用以下方法,则...

options.setDiskSizeGb(30);
options.setWorkerMachineType("n1-standard-1");

... then your dataflow workers will run on VM's with 1 CPU and 3.75GB of memory and they will use a storage disk of 30GB. ...那么您的数据流工作者将在具有1个CPU和3.75GB内存的VM上运行,并且他们将使用30GB的存储磁盘。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM