简体   繁体   English

写入JSON记录时如何使用python BigQuery客户端进行UPSERT操作

[英]How to perform the UPSERT operation using the python BigQuery client when writing JSON record

I am writing JSON records into a BigQuery table using the function bq.insert_rows_json(f'{project}.{dataset}.{table_name}', rows_to_insert) .我正在使用 function bq.insert_rows_json(f'{project}.{dataset}.{table_name}', rows_to_insert)将 JSON 记录写入 BigQuery 表。 This operation is done in INSERT mode.此操作在INSERT模式下完成。 I was wondering if I could use the same function but in UPSERT mode.我想知道我是否可以在UPSERT模式下使用相同的 function。 Is it possible?是否可以? I check the documentation here but did not find an argument for that.我在这里查看了文档,但没有找到相关的论据。

I can't seem to find an in-built UPSERT function for python. However, you may try and consider the below approach which is derived from the comment of @Mr.Nobody.我似乎找不到 python 的内置UPSERT function。但是,您可以尝试并考虑以下源自@Mr.Nobody 的评论的方法。

from google.cloud import bigquery

client = bigquery.Client()

query_job = client.query(
    """
    MERGE my-dataset.json_table T
USING my-dataset.json_table_source S
ON T.int64_field_0 = S.int64_field_0
WHEN MATCHED THEN
  UPDATE SET string_field_1 = s.string_field_1
WHEN NOT MATCHED THEN
  INSERT (int64_field_0, string_field_1) VALUES(int64_field_0, string_field_1)"""
)

results = query_job.result()  # Waits for job to complete.

In this approach, you will be needing to ingest all of your supposedly "updated" JSON data on a table before inserting or updating them to your main BigQuery table.在这种方法中,您将需要在将数据插入或更新到您的主 BigQuery 表之前,将所有所谓的“更新的”JSON 数据提取到表中。 The query then will match each rows to the main table if the primary ID (uniqueness checker) is already there (then query will do UPDATE ) or not yet (then query will do INSERT ).如果主 ID (唯一性检查器)已经存在(然后查询将执行UPDATE )或尚未存在(然后查询将执行INSERT ),则查询会将每一行与主表匹配。

Screenshot of both tables before running the python code.运行 python 代码之前两个表的屏幕截图。

Main Table:主表: 在此处输入图像描述 Source Table:源表: 在此处输入图像描述

Screenshot of the Main Table when the python code finished executing. python 代码执行完毕时主表的屏幕截图。 在此处输入图像描述

Conclusion: The int64_field_0 4 was updated (from version 1.0.0. to 6.5.1) because it is already existing in the Main table.结论: int64_field_0 4已更新(从版本 1.0.0 到 6.5.1),因为它已经存在于主表中。 The int64_field_0 5 was inserted because it is not yet existing on the main table. int64_field_0 5已插入,因为它在主表中尚不存在。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM