简体   繁体   中英

TypeError when connecting to Google Cloud BigQuery from Apache Beam Dataflow in Python?

When trying to initialize the python BigQuery Client() in apache beam's google cloud dataflow, its giving me a type error:

TypeError('__init__() takes 2 positional arguments but 3 were given')

I am using Python 3.7 with apache beam dataflow, and I have to initialize the client and write to BigQuery manually instead of using a ptransform because I want to use a dynamic table name which is passed through runtime parameters.

I've tried passing through the project and credentials to the client but it doesn't seem to do anything. Furthermore if I use google-cloud-bigquery==1.11.2 instead of 1.13.0 it works fine, also using the 1.13.0 outside of apache beam also works completely fine.

I have obviously cut out a bit of code, but this is essentially what is throwing the error

class SaveObjectsBigQuery(beam.DoFn):
    def process(self, element, *args, **kwargs):
        # Establish BigQuery client
        client = bigquery.Client(project=project)


def run():
    pipeline_options = PipelineOptions()

    # GoogleCloud options object
    cloud_options = pipeline_options.view_as(GoogleCloudOptions)

    pipeline_options.view_as(SetupOptions).save_main_session = True

    with beam.Pipeline(options=pipeline_options) as p:
        _data = (p
                 | "Create" >> beam.Create(["Start"])
                 )

        save_data_bigquery = _data | "Save to BigQuery" >> beam.ParDo(SaveObjectsBigQuery())

In earlier versions of google-cloud-bigquery this works fine, and I am able to create a table with the runtime parameter and insert_rows_json without any problem. Obviously using the WriteToBigquery Ptransform would be ideal but it's not possible due to the necessity of dynamically naming the bigquery tables.

EDIT:

I updated the code to try to take out a runtime value provider and lambda function, although recieved an similar error for both:

`AttributeError: 'function/RuntimeValueProvider' object has no attribute 'tableId'

I am essentially trying to use a Runtime Value Provider when launching a dataflow template to dynamically name a bigquery table using the WriteToBigQuery PTransform.

save_data_bigquery = _data | WriteToBigQuery(
            project=project,
            dataset="campaign_contact",
            table=value_provider.RuntimeValueProvider(option_name="table", default_value=None, value_type=str),
            schema="id:STRING",
            create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
            write_disposition=BigQueryDisposition.WRITE_APPEND
        )
save_data_bigquery = _data | WriteToBigQuery(
            table=lambda table: f"{project}:dataset.{runtime_options.table}",
            schema="id:STRING",
            create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
            write_disposition=BigQueryDisposition.WRITE_APPEND
        )

As of Beam 2.12, you can use the WriteToBigQuery transform to assign destinations dynamically. I'd recommend you try it out : )

Check out this test in the Beam codebase that shows an example of this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM