Writing apache beam pCollection to bigquery causes type Error

Question

I have a simple beam pipeline, as follows:

    with beam.Pipeline() as pipeline:
    output = (
            pipeline
            | 'Read CSV' >> beam.io.ReadFromText('raw_files/myfile.csv',
                                                 skip_header_lines=True)
            | 'Split strings' >> beam.Map(lambda x: x.split(','))
            | 'Convert records to dictionary' >> beam.Map(to_json)
            | beam.io.WriteToBigQuery(project='gcp_project_id',
                                      dataset='datasetID',
                                      table='tableID',
                                      create_disposition=bigquery.CreateDisposition.CREATE_NEVER,
                                      write_disposition=bigquery.WriteDisposition.WRITE_APPEND
                                      )
            )

However upon runnning I get a typeError, stating the following:

line 2147, in __init__
self.table_reference = bigquery_tools.parse_table_reference(if isinstance(table, 
TableReference):
    TypeError: isinstance() arg 2 must be a type or tuple of types

I have tried defining a TableReference object and passing it to the WriteToBigQuery class but still facing the same issue. Am I missing something here? I've been stuck at this step for what feels like forever and I don't know what to do. Any help is appreciated!

Answer 1

This probably occurred since you installed Apache Beam without the GCP modules. Please make sure to do following (in a virtual environment).

pip install apache-beam[gcp]

It's a weird error though so feel free to file a Github issue against the Apache Beam project.

Writing apache beam pCollection to bigquery causes type Error

Question

1 answers

solution1
1 2022-08-18 20:35:37

Writing apache beam pCollection to bigquery causes type Error

Question

1 answers

solution1 1 2022-08-18 20:35:37

solution1
1 2022-08-18 20:35:37