简体   繁体   中英

Using run time parameters with BigtableIO in Apache Beam

I am trying to use run time parameters with BigtableIO in Apache Beam to write to BigTable.

I have created a pipeline to read from BigQuery and writing to Bigtable. The pipeline works fine when i provide static parameters (using ConfigBigtableIO and ConfigBigtableConfiguration, referring to example here - https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/blob/master/java/dataflow-connector-examples/src/main/java/com/google/cloud/bigtable/dataflow/example/HelloWorldWrite.java ) but I am getting a compile error while trying to setup the pipeline with run time parameters. The options is setup with all parameters being runtime Value Providers.

    p.apply(BigQueryIO.readTableRows().fromQuery(options.getBqQuery())
            .usingStandardSql())
            .apply(ParDo.of(new TransFormFn(options.getColumnFamily(), options.getRowKey(), options.getColumnKey(), options.getRowKeySuffix())))

            .apply(BigtableIO.write().withProjectId(options.getBigtableProjectId()).
                    withInstanceId(options.getBigtableInstanceId()).
                   withTableId(options.getBigtableTableId()));

It is expecting the output of Bigtable.write()... to be org.apache.beam.sdk.transforms.PTransform,OutputT> while Bigtable.write() is returning a Write object. Can you help with providing the correct syntax to fix this? Thanks.

Runtime parameters are meant to be used in Dataflow templates.

Are you trying to create a template and run the pipeline using the template? If yes, you would need following steps:

  1. Create an Options that has runtime parameters you need, ie
    ValueProvider tableId.
  2. Pass these runtime parameters to the config object: ie withTableId(ValueProvider tableId) =>
    withTableId(options.getTableId())
  3. Construct your template
  4. Execute your pipeline using the template.

The advantage of using a template is that it allows pipeline to be constructed once and executed multiple times later with runtime parameters. For more information on how to use Dataflow template: https://cloud.google.com/dataflow/docs/templates/overview

When not using Dataflow template, you don't have set runtime parameters, ie withTableId(ValueProvider tableId). Instead, use withTableId(String tableId).

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM