[英]Dataflow doesn’t create an empty partition when writing to a Bigquery time-unit column partition
I'm running workflows that look at BigQuery to see if an output of a dependent workflow has been run successfully.我正在运行查看 BigQuery 的工作流,以查看依赖工作流的 output 是否已成功运行。 The problem is that sometimes the output is empty, but I still would like to have that table/partition created, even if it contains no data.
问题是有时 output 是空的,但我仍然希望创建该表/分区,即使它不包含数据。
Before when I was using sharding for my table (table_YYYYMMDD) this worked fine because when there was no data the table was still created and that would indicate that the workflow ran successfully.在我为我的表(table_YYYYMMDD)使用分片之前,这工作得很好,因为当没有数据时,表仍然被创建,这表明工作流运行成功。 Now when I use a partitioned table, dataflow doesn't generate an empty partition.
现在,当我使用分区表时,数据流不会生成空分区。
When I use the python BigQuery library and upload an empty file this will create an empty partition.当我使用 python BigQuery 库并上传一个空文件时,这将创建一个空分区。
My hope is that there is some parameter I can set that would solve this problem in dataflow directly.我希望我可以设置一些参数来直接解决数据流中的这个问题。
Any suggestions?有什么建议么? This is how my WriteToBigQuery step look today:
这就是我今天的 WriteToBigQuery 步骤的样子:
# Write to BigQuery.
formatted_results | 'Write to BigQuery' >> WriteToBigQuery(
table="table_name${table_format}".format(table_format=arguments.table_format),
schema=DESTINATION_SCHEMA,
additional_bq_parameters={'timePartitioning': {'type': 'HOUR', "field": "timestamp"}},
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=BigQueryDisposition.WRITE_TRUNCATE)
If there's no data, Beam will never write to that partition, and hence that partition will never be created.如果没有数据,Beam 将永远不会写入该分区,因此永远不会创建该分区。 If you think this is a more general feature request, please file a Github issue .
如果您认为这是一个更一般的功能请求,请提交Github 问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.