I'm running workflows that look at BigQuery to see if an output of a dependent workflow has been run successfully. The problem is that sometimes the output is empty, but I still would like to have that table/partition created, even if it contains no data.
Before when I was using sharding for my table (table_YYYYMMDD) this worked fine because when there was no data the table was still created and that would indicate that the workflow ran successfully. Now when I use a partitioned table, dataflow doesn't generate an empty partition.
When I use the python BigQuery library and upload an empty file this will create an empty partition.
My hope is that there is some parameter I can set that would solve this problem in dataflow directly.
Any suggestions? This is how my WriteToBigQuery step look today:
# Write to BigQuery.
formatted_results | 'Write to BigQuery' >> WriteToBigQuery(
table="table_name${table_format}".format(table_format=arguments.table_format),
schema=DESTINATION_SCHEMA,
additional_bq_parameters={'timePartitioning': {'type': 'HOUR', "field": "timestamp"}},
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=BigQueryDisposition.WRITE_TRUNCATE)
If there's no data, Beam will never write to that partition, and hence that partition will never be created. If you think this is a more general feature request, please file a Github issue .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.