[英]Google Cloud Dataflow - WriteToBigQuery: 'NoneType' object has no attribute '__getitem__'
[英]GCP Dataflow - NoneType error during WriteToBigQuery()
我正在嘗試使用光束將 csv 文件中的數據從 GCS 傳輸到 BQ,但是當我調用 WriteToBigQuery 時出現 NoneType 錯誤。 錯誤信息:
AttributeError: 'NoneType' object has no attribute 'items' [while running 'Write to BQ/_StreamToBigQuery/StreamInsertRows/ParDo(BigQueryWriteFn)']
我的管道代碼:
import apache_beam as beam
from apache_beam.pipeline import PipelineOptions
from apache_beam.io.textio import ReadFromText
options = {
'project': project,
'region': region,
'temp_location': bucket
'staging_location': bucket
'setup_file': './setup.py'
}
class Split(beam.DoFn):
def process(self, element):
n, cc = element.split(",")
return [{
'n': int(n.strip('"')),
'connection_country': str(cc.strip()),
}]
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
with beam.Pipeline(options=pipeline_options) as pipeline:
(pipeline
| 'Read from GCS' >> ReadFromText('file_path*', skip_header_lines=1)
| 'parse input' >> beam.ParDo(Split())
| 'print' >> beam.Map(print)
| 'Write to BQ' >> beam.io.WriteToBigQuery(
'from_gcs', 'demo', schema='n:INTEGER, connection_country:STRING',
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
)
我的 csv 看起來像這樣:
print() 階段的光束摘錄如下所示:
感謝任何幫助!
我正在嘗試使用 Beam 將 csv 文件中的數據從 GCS 傳輸到 BQ,但是當我調用 WriteToBigQuery 時出現 NoneType 錯誤。 錯誤信息:
AttributeError: 'NoneType' object has no attribute 'items' [while running 'Write to BQ/_StreamToBigQuery/StreamInsertRows/ParDo(BigQueryWriteFn)']
我的管道代碼:
import apache_beam as beam
from apache_beam.pipeline import PipelineOptions
from apache_beam.io.textio import ReadFromText
options = {
'project': project,
'region': region,
'temp_location': bucket
'staging_location': bucket
'setup_file': './setup.py'
}
class Split(beam.DoFn):
def process(self, element):
n, cc = element.split(",")
return [{
'n': int(n.strip('"')),
'connection_country': str(cc.strip()),
}]
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
with beam.Pipeline(options=pipeline_options) as pipeline:
(pipeline
| 'Read from GCS' >> ReadFromText('file_path*', skip_header_lines=1)
| 'parse input' >> beam.ParDo(Split())
| 'print' >> beam.Map(print)
| 'Write to BQ' >> beam.io.WriteToBigQuery(
'from_gcs', 'demo', schema='n:INTEGER, connection_country:STRING',
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
)
我的 csv 看起來像這樣:
print() 階段的光束摘錄如下所示:
感謝任何幫助!
您可以使用過濾掉 None 類型的消息
def filter_none_messages(msg):
print(F"Message filtered: {msg}")
return msg
並添加| "FilterNoneMessages" >> beam.Filter(filter_none_messages)
管道中的| "FilterNoneMessages" >> beam.Filter(filter_none_messages)
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.