GCP 數據流 - WriteToBigQuery() 期間出現 NoneType 錯誤

Question

我正在嘗試使用光束將 csv 文件中的數據從 GCS 傳輸到 BQ，但是當我調用 WriteToBigQuery 時出現 NoneType 錯誤。 錯誤信息：

AttributeError: 'NoneType' object has no attribute 'items' [while running 'Write to BQ/_StreamToBigQuery/StreamInsertRows/ParDo(BigQueryWriteFn)']

我的管道代碼：

import apache_beam as beam
from apache_beam.pipeline import PipelineOptions
from apache_beam.io.textio import ReadFromText


options = {
    'project': project,
    'region': region,
    'temp_location': bucket
    'staging_location': bucket
    'setup_file': './setup.py'
}


class Split(beam.DoFn):
    def process(self, element):
        n, cc = element.split(",")
        return [{
            'n': int(n.strip('"')),
            'connection_country': str(cc.strip()),
        }]


pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)

with beam.Pipeline(options=pipeline_options) as pipeline:
    (pipeline
        | 'Read from GCS' >> ReadFromText('file_path*', skip_header_lines=1)
        | 'parse input' >> beam.ParDo(Split())
        | 'print' >> beam.Map(print)
        | 'Write to BQ' >> beam.io.WriteToBigQuery(
            'from_gcs', 'demo', schema='n:INTEGER, connection_country:STRING',
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
            write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
        )

我的 csv 看起來像這樣：

print() 階段的光束摘錄如下所示：

感謝任何幫助！

Answer 1

我正在嘗試使用 Beam 將 csv 文件中的數據從 GCS 傳輸到 BQ，但是當我調用 WriteToBigQuery 時出現 NoneType 錯誤。 錯誤信息：

AttributeError: 'NoneType' object has no attribute 'items' [while running 'Write to BQ/_StreamToBigQuery/StreamInsertRows/ParDo(BigQueryWriteFn)']

我的管道代碼：

import apache_beam as beam
from apache_beam.pipeline import PipelineOptions
from apache_beam.io.textio import ReadFromText


options = {
    'project': project,
    'region': region,
    'temp_location': bucket
    'staging_location': bucket
    'setup_file': './setup.py'
}


class Split(beam.DoFn):
    def process(self, element):
        n, cc = element.split(",")
        return [{
            'n': int(n.strip('"')),
            'connection_country': str(cc.strip()),
        }]


pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)

with beam.Pipeline(options=pipeline_options) as pipeline:
    (pipeline
        | 'Read from GCS' >> ReadFromText('file_path*', skip_header_lines=1)
        | 'parse input' >> beam.ParDo(Split())
        | 'print' >> beam.Map(print)
        | 'Write to BQ' >> beam.io.WriteToBigQuery(
            'from_gcs', 'demo', schema='n:INTEGER, connection_country:STRING',
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
            write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
        )

我的 csv 看起來像這樣：

print() 階段的光束摘錄如下所示：

感謝任何幫助！

Answer 2

您可以使用過濾掉 None 類型的消息

def filter_none_messages(msg):
    print(F"Message filtered: {msg}")
    return msg

並添加| "FilterNoneMessages" >> beam.Filter(filter_none_messages) 管道中的| "FilterNoneMessages" >> beam.Filter(filter_none_messages) 。

GCP 數據流 - WriteToBigQuery() 期間出現 NoneType 錯誤

問題描述

2 個解決方案

解決方案1
1 已采納 2020-09-09 18:51:28

解決方案2
0 2022-08-09 09:15:14

GCP 數據流 - WriteToBigQuery() 期間出現 NoneType 錯誤

問題描述

2 個解決方案

解決方案1 1 已采納 2020-09-09 18:51:28

解決方案2 0 2022-08-09 09:15:14

解決方案1
1 已采納 2020-09-09 18:51:28

解決方案2
0 2022-08-09 09:15:14