简体   繁体   English

写入 Bigquery 时间单位列分区时,Dataflow 不会创建空分区

[英]Dataflow doesn’t create an empty partition when writing to a Bigquery time-unit column partition

I'm running workflows that look at BigQuery to see if an output of a dependent workflow has been run successfully.我正在运行查看 BigQuery 的工作流,以查看依赖工作流的 output 是否已成功运行。 The problem is that sometimes the output is empty, but I still would like to have that table/partition created, even if it contains no data.问题是有时 output 是空的,但我仍然希望创建该表/分区,即使它不包含数据。

Before when I was using sharding for my table (table_YYYYMMDD) this worked fine because when there was no data the table was still created and that would indicate that the workflow ran successfully.在我为我的表(table_YYYYMMDD)使用分片之前,这工作得很好,因为当没有数据时,表仍然被创建,这表明工作流运行成功。 Now when I use a partitioned table, dataflow doesn't generate an empty partition.现在,当我使用分区表时,数据流不会生成空分区。

When I use the python BigQuery library and upload an empty file this will create an empty partition.当我使用 python BigQuery 库并上传一个空文件时,这将创建一个空分区。

My hope is that there is some parameter I can set that would solve this problem in dataflow directly.我希望我可以设置一些参数来直接解决数据流中的这个问题。

Any suggestions?有什么建议么? This is how my WriteToBigQuery step look today:这就是我今天的 WriteToBigQuery 步骤的样子:

        # Write to BigQuery.
        formatted_results | 'Write to BigQuery' >> WriteToBigQuery(
            table="table_name${table_format}".format(table_format=arguments.table_format),
            schema=DESTINATION_SCHEMA,
            additional_bq_parameters={'timePartitioning': {'type': 'HOUR', "field": "timestamp"}},
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
            write_disposition=BigQueryDisposition.WRITE_TRUNCATE)

If there's no data, Beam will never write to that partition, and hence that partition will never be created.如果没有数据,Beam 将永远不会写入该分区,因此永远不会创建该分区。 If you think this is a more general feature request, please file a Github issue .如果您认为这是一个更一般的功能请求,请提交Github 问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 SQL/BigQuery:当语句通过分区时的情况 - SQL/BigQuery: case when statement over partition by 如何检查 bigquery 日分区是否为空 - How to check if a bigquery day partition is empty 使用 BigQuery 存储写入 API 的 Google 数据流存储到特定分区 - Google Dataflow store to specific Partition using BigQuery Storage Write API 在 BigQuery 的分组依据和分区依据中使用相同的列 - Use the same column in both group by and partition by in BigQuery 选择需要对分区列进行过滤的 Bigquery 表的最新分区 - choose latest partition of a Bigquery table where filter over partition column is required BigQuery (BQ) - 删除分区 - BigQuery (BQ) - Drop Partition 滚动日期范围内的 BigQuery 非重复计数,列上有分区 - BigQuery distinct count in rolling date range, with partition on column 使用 ADF 数据流对大型 json 文件进行分区 - Partition large json files with ADF dataflow pyspark 分区为每个分区创建一个额外的空文件 - pyspark partitioning create an extra empty file for every partition 当表进入 Firebase 分析分区表时,在 Bigquery 中安排查询 - Schedule a query in Bigquery when a table come in Firebase Analytics Partition table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM