简体   繁体   English

Apache 带数据流的光束:WriteToBigQuery 的标志“ignore_unknown_columns”不起作用

[英]Apache Beam with Dataflow: flag 'ignore_unknown_columns' for WriteToBigQuery not working

I am building a streaming pipeline using Apache Beam (Python SDK version 2.37.0) and Google Dataflow to write some data I receive via Pubsub to BigQuery.我正在使用 Apache Beam(Python SDK 版本 2.37.0)和 Google Dataflow 构建一个流媒体管道,以将我通过 Pubsub 收到的一些数据写入 BigQuery。

I process the data and end up with rows represented by a dictionary like this:我处理数据并以这样的字典表示的行结束:

{'val1': 17.4, 'val2': 40.8, 'timestamp': 1650456507, 'NA_VAL': 'table_name'}

I then want to use WriteToBigQuery to insert the values into my table.然后我想使用WriteToBigQuery将值插入到我的表中。

However, my table only has the columns val1 , val2 , and timestamp .但是,我的表只有列val1val2timestamp Therefore, NA_VAL should be ignored.因此,应忽略NA_VAL From how I understand the docs , this should be possible by setting ignore_unknown_columns=True .根据我对文档的理解,这应该可以通过设置ignore_unknown_columns=True来实现。

However, when running the pipeline in Dataflow, I still receive an error and no values are inserted into the table:但是,在 Dataflow 中运行管道时,我仍然收到错误消息,并且没有任何值插入到表中:

There were errors inserting to BigQuery. Will not retry. Errors were [{'index': 0, 'errors': [{'reason': 'invalid', 'location': 'NA_VAL', 'debugInfo': '', 'message': 'no such field: NA_VAL.'}]}]

I tried with a simple job configuration like this我试过像这样的简单作业配置

rows | beam.io.WriteToBigQuery(
            table='PROJECT:DATASET.TABLE',
            ignore_unknown_columns=True)

as well as with those parameters以及那些参数

rows | beam.io.WriteToBigQuery(
            table='PROJECT:DATASET.TABLE',
            ignore_unknown_columns=True,
            create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER,
            write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
            method='STREAMING_INSERTS',
            insert_retry_strategy='RETRY_NEVER')

Question : Am I missing something here that is preventing the pipeline from working?问题:我是否遗漏了一些阻止管道工作的东西? Does anyone have the same issue and/or a solution for this?有没有人有同样的问题和/或解决方案?

Unfortunately you have been bitten by a bug.不幸的是你被虫子咬了。 This was reported as https://issues.apache.org/jira/browse/BEAM-14039 and fixed by https://github.com/apache/beam/pull/16999 .这被报告为https://issues.apache.org/jira/browse/BEAM-14039并由https://github.com/apache/beam/pull/16999修复。 Version 2.38.0 will include this fix.版本 2.38.0 将包含此修复程序。 Verification for that release just concluded today, so it should be available quite soon.该版本的验证今天刚刚结束,因此它应该很快就会可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM