简体   繁体   中英

Writing apache beam pCollection to bigquery causes type Error

I have a simple beam pipeline, as follows:

    with beam.Pipeline() as pipeline:
    output = (
            pipeline
            | 'Read CSV' >> beam.io.ReadFromText('raw_files/myfile.csv',
                                                 skip_header_lines=True)
            | 'Split strings' >> beam.Map(lambda x: x.split(','))
            | 'Convert records to dictionary' >> beam.Map(to_json)
            | beam.io.WriteToBigQuery(project='gcp_project_id',
                                      dataset='datasetID',
                                      table='tableID',
                                      create_disposition=bigquery.CreateDisposition.CREATE_NEVER,
                                      write_disposition=bigquery.WriteDisposition.WRITE_APPEND
                                      )
            )

However upon runnning I get a typeError, stating the following:

line 2147, in __init__
self.table_reference = bigquery_tools.parse_table_reference(if isinstance(table, 
TableReference):
    TypeError: isinstance() arg 2 must be a type or tuple of types

I have tried defining a TableReference object and passing it to the WriteToBigQuery class but still facing the same issue. Am I missing something here? I've been stuck at this step for what feels like forever and I don't know what to do. Any help is appreciated!

This probably occurred since you installed Apache Beam without the GCP modules. Please make sure to do following (in a virtual environment).

pip install apache-beam[gcp]

It's a weird error though so feel free to file a Github issue against the Apache Beam project.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM