简体   繁体   English

Google BigQuery WRITE_TRUNCATE删除所有数据

[英]Google BigQuery WRITE_TRUNCATE erasing all data

I have a table setup in BQ where if I write data that exists on a certain date partition I want it to overwrite. 我在BQ中有一个表设置,如果我写某个日期分区上exists数据,则希望它覆盖。 I've set up the job_config to use WRITE_TRUNCATE. 我已经将job_config设置为使用WRITE_TRUNCATE。

#file_obj = Some ndjson StringIO file like obj

job_config = bigquery.QueryJobConfig()
# Set configuration.query.destinationTable
dest_dataset = 'test'
dest_table_name = 'sales_data'
destination_dataset = client.dataset(dest_dataset)
destination_table = destination_dataset.table(dest_table_name)
job_config.destination = destination_table

# Set configuration.query.writeDisposition & SourceFormat
job_config.write_disposition = 'WRITE_TRUNCATE'
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON

# Set partitioning
time_partitioning = bigquery.table.TimePartitioning(
    bigquery.table.TimePartitioningType.DAY, 'date'
)
job_config.time_partitioning = time_partitioning

# Start the load job
job = client.load_table_from_file(
        file_obj, destination_table,
        job_config=job_config
)
# Wait for the job to finish
job.result()

However, I noticed that when I backfilled data it always overwrites all data in the table even if the date partition is different. 但是,我注意到,回填数据时,即使日期分区不同,它也始终会覆盖表中的所有数据。 For example if I have data in the table from 20190101-20190201 and I load data from 20190202-Present my whole table gets erased and it only includes the new data. 例如,如果我的表中的数据来自20190101-20190201而我加载的数据则来自20190202-Present我的整个表将被擦除,并且仅包含新数据。 Shouldn't this data remain preserved since it's on a different partition date? 由于数据位于不同的分区日期,是否不应该保留这些数据? Any idea why this is happening or if I'm missing something? 知道为什么会发生这种情况,或者我缺少什么吗?

Any idea why this is happening or if I'm missing something? 知道为什么会发生这种情况,或者我缺少什么吗?

job_config.write_disposition = 'WRITE_TRUNCATE' is the whole table scope action - and says If the table already exists - overwrites the table data. job_config.write_disposition ='WRITE_TRUNCATE'是整个表范围的操作-并说If the table already exists - overwrites the table data. This does not consider any partitioning and affects the whole table 这不考虑任何分区,并且会影响整个表

If you need to overwrite specific partition you need to specifically reference this partition - for example as sales_data$20190202 如果需要覆盖特定的分区,则需要专门引用此分区-例如,作为sales_data$20190202

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Google BigQuery:使用WRITE_TRUNCATE复制到同一张表后立即进行流插入时的数据丢失 - Google BigQuery: Data drop when streaming insert right after copy to same table with WRITE_TRUNCATE BigQuery - 如何导入 WRITE_TRUNCATE 以覆盖大查询数据? - BigQuery - How to import WRITE_TRUNCATE for big query data overwrite? BigQueryOperator 在 write_disposition='WRITE_TRUNCATE' 时更改表架构和列模式 - BigQueryOperator changes the table schema and column modes when write_disposition='WRITE_TRUNCATE' Python不使用truncate擦除文件内容 - Python not erasing contents of file with truncate 将 Pandas DataFrame 写入 Google Cloud Storage 或 BigQuery - Write a Pandas DataFrame to Google Cloud Storage or BigQuery 高效地将 Pandas 数据框写入 Google BigQuery - Efficiently write a Pandas dataframe to Google BigQuery 将Google数据存储区和bigquery数据连接在一起 - Joining google datastore and bigquery data together 从Python将大量数据加载到Google Bigquery中 - Loading a Lot of Data into Google Bigquery from Python 将数据从 Google BigQuery 加载到 Spark(在 Databricks 上) - Loading Data from Google BigQuery into Spark (on Databricks) 导出Google BigQuery数据到Python Pandas dataframe - Export Google BigQuery data to Python Pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM