简体   繁体   中英

Custom Google Dataflow Options

Due to very small traffic expected, dataflow with minimum resources is needed. The values needed are: 1 vCPU , 1 GB Memory and 30 GB Storage - Standard Persistent Disk .

How can one create such a dataflow? What i have so far is the following:

    DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
    options.setProject("project-id");
    options.setRunner(DataflowRunner.class);
    //Begin: Autoscalling --disable
    options.setAutoscalingAlgorithm(DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType.NONE);
    options.setNumWorkers(1);
    //End: Autoscalling
    options.setStreaming(true);
    options.setAppName("");
    options.setMaxNumWorkers(1);

Where can one specify resources like vCPU , Memory and Storage - Standard Persistent Disk in dataflow options?

Update

I'm new to GCP , any criticism is accepted

From the Javadocs

setDiskSizeGb

Remote worker disk size, in gigabytes, or 0 to use the default size.

And ...

setWorkerMachineType

Machine type to create Dataflow worker VMs as.

See GCE machine types for a list of valid options.

If unset, the Dataflow service will choose a reasonable default.

The allowed machine types are listed here , for your needs ("1vCPU, 1GB Memory") this one is the closest match: n1-standard-1 .

So, if you invoke the following methods on DataflowPipelineOptions ...

options.setDiskSizeGb(30);
options.setWorkerMachineType("n1-standard-1");

... then your dataflow workers will run on VM's with 1 CPU and 3.75GB of memory and they will use a storage disk of 30GB.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM