I am trying to use the new Google BigQuery Storage Write API in a Dataflow job using Beam. I am using
BigQueryIO<Pair<String,String>>write().withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
however when I run it I get an error saying
When writing an unbounded PCollection via FILE_LOADS or STORAGE_API_WRITES, triggering frequency must be specified
however the beam docs ( https://beam.apache.org/releases/javadoc/2.7.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withTriggeringFrequency-org.joda.time.Duration ) for triggeringFrequency
say
This is only applicable when the write method is set to BigQueryIO.Write.Method.FILE_LOADS, and only when writing an unbounded PCollection.
To be clear, I am using the STORAGE_WRITE_API
method, not FILE_LOADS
I am confused as to why it is asking me to include the triggeringFrequency
field?
edit : Documentation on this new storage write API is poor but I am thinking that it, under the hood, is doing a form of batching so, like the FILE
method, it needs some frequency to determine the rate of batching
Looking at the source code where triggering frequency is fetched, the storage api triggering frequency is from the BigQueryOptions iff the triggeringFrequency of the IO is not set, see getStorageApiTriggeringFrequency(BigQueryOptions options) .
So
When writing an unbounded PCollection via FILE_LOADS or STORAGE_API_WRITES, triggering frequency must be specified
is correct.
But
This is only applicable when the write method is set to BigQueryIO.Write.Method.FILE_LOADS, and only when writing an unbounded PCollection.
is wrong.
Should be
This is only applicable when the write method is set to BigQueryIO.Write.Method.FILE_LOADS or BigQueryIO.Write.Method.STORAGE_WRITE_API, and only when writing an unbounded PCollection.
Ideally, you probably should set it through the BigQueryOptions.setStorageWriteApiTriggeringFrequencySec().
I think the documentation is intentionally hiding the implementation details of overriding the options through the I/O class builder itself.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.