简体   繁体   中英

Dataflow doesn’t create an empty partition when writing to a Bigquery time-unit column partition

I'm running workflows that look at BigQuery to see if an output of a dependent workflow has been run successfully. The problem is that sometimes the output is empty, but I still would like to have that table/partition created, even if it contains no data.

Before when I was using sharding for my table (table_YYYYMMDD) this worked fine because when there was no data the table was still created and that would indicate that the workflow ran successfully. Now when I use a partitioned table, dataflow doesn't generate an empty partition.

When I use the python BigQuery library and upload an empty file this will create an empty partition.

My hope is that there is some parameter I can set that would solve this problem in dataflow directly.

Any suggestions? This is how my WriteToBigQuery step look today:

        # Write to BigQuery.
        formatted_results | 'Write to BigQuery' >> WriteToBigQuery(
            table="table_name${table_format}".format(table_format=arguments.table_format),
            schema=DESTINATION_SCHEMA,
            additional_bq_parameters={'timePartitioning': {'type': 'HOUR', "field": "timestamp"}},
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
            write_disposition=BigQueryDisposition.WRITE_TRUNCATE)

If there's no data, Beam will never write to that partition, and hence that partition will never be created. If you think this is a more general feature request, please file a Github issue .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM