简体   繁体   中英

Google Bigquery Storage Write API apache beam triggering frequency

I am trying to use the new Google BigQuery Storage Write API in a Dataflow job using Beam. I am using

BigQueryIO<Pair<String,String>>write().withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)

however when I run it I get an error saying

When writing an unbounded PCollection via FILE_LOADS or STORAGE_API_WRITES, triggering frequency must be specified

however the beam docs ( https://beam.apache.org/releases/javadoc/2.7.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withTriggeringFrequency-org.joda.time.Duration ) for triggeringFrequency say

This is only applicable when the write method is set to BigQueryIO.Write.Method.FILE_LOADS, and only when writing an unbounded PCollection.

To be clear, I am using the STORAGE_WRITE_API method, not FILE_LOADS

I am confused as to why it is asking me to include the triggeringFrequency field?

edit : Documentation on this new storage write API is poor but I am thinking that it, under the hood, is doing a form of batching so, like the FILE method, it needs some frequency to determine the rate of batching

Looking at the source code where triggering frequency is fetched, the storage api triggering frequency is from the BigQueryOptions iff the triggeringFrequency of the IO is not set, see getStorageApiTriggeringFrequency(BigQueryOptions options) .

So

When writing an unbounded PCollection via FILE_LOADS or STORAGE_API_WRITES, triggering frequency must be specified

is correct.

But

This is only applicable when the write method is set to BigQueryIO.Write.Method.FILE_LOADS, and only when writing an unbounded PCollection.

is wrong.

Should be

This is only applicable when the write method is set to BigQueryIO.Write.Method.FILE_LOADS or BigQueryIO.Write.Method.STORAGE_WRITE_API, and only when writing an unbounded PCollection.

Ideally, you probably should set it through the BigQueryOptions.setStorageWriteApiTriggeringFrequencySec().

I think the documentation is intentionally hiding the implementation details of overriding the options through the I/O class builder itself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM