When trying to initialize the python BigQuery Client() in apache beam's google cloud dataflow, its giving me a type error:
TypeError('__init__() takes 2 positional arguments but 3 were given')
I am using Python 3.7 with apache beam dataflow, and I have to initialize the client and write to BigQuery manually instead of using a ptransform because I want to use a dynamic table name which is passed through runtime parameters.
I've tried passing through the project and credentials to the client but it doesn't seem to do anything. Furthermore if I use google-cloud-bigquery==1.11.2 instead of 1.13.0 it works fine, also using the 1.13.0 outside of apache beam also works completely fine.
I have obviously cut out a bit of code, but this is essentially what is throwing the error
class SaveObjectsBigQuery(beam.DoFn):
def process(self, element, *args, **kwargs):
# Establish BigQuery client
client = bigquery.Client(project=project)
def run():
pipeline_options = PipelineOptions()
# GoogleCloud options object
cloud_options = pipeline_options.view_as(GoogleCloudOptions)
pipeline_options.view_as(SetupOptions).save_main_session = True
with beam.Pipeline(options=pipeline_options) as p:
_data = (p
| "Create" >> beam.Create(["Start"])
)
save_data_bigquery = _data | "Save to BigQuery" >> beam.ParDo(SaveObjectsBigQuery())
In earlier versions of google-cloud-bigquery this works fine, and I am able to create a table with the runtime parameter and insert_rows_json without any problem. Obviously using the WriteToBigquery Ptransform would be ideal but it's not possible due to the necessity of dynamically naming the bigquery tables.
EDIT:
I updated the code to try to take out a runtime value provider and lambda function, although recieved an similar error for both:
`AttributeError: 'function/RuntimeValueProvider' object has no attribute 'tableId'
I am essentially trying to use a Runtime Value Provider when launching a dataflow template to dynamically name a bigquery table using the WriteToBigQuery PTransform.
save_data_bigquery = _data | WriteToBigQuery(
project=project,
dataset="campaign_contact",
table=value_provider.RuntimeValueProvider(option_name="table", default_value=None, value_type=str),
schema="id:STRING",
create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=BigQueryDisposition.WRITE_APPEND
)
save_data_bigquery = _data | WriteToBigQuery(
table=lambda table: f"{project}:dataset.{runtime_options.table}",
schema="id:STRING",
create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=BigQueryDisposition.WRITE_APPEND
)
As of Beam 2.12, you can use the WriteToBigQuery
transform to assign destinations dynamically. I'd recommend you try it out : )
Check out this test in the Beam codebase that shows an example of this.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.