简体   繁体   English

JSON 上传到 BigQuery

[英]JSON upload to BigQuery

I'm trying to automate JSON data upload to BigQuery using two cloud functions which both deploys successfully and a cloud scheduler which runs successfully.我正在尝试使用两个成功部署的云函数和一个成功运行的云调度程序来自动将 JSON 数据上传到 BigQuery。 After running the cloud scheduler, data gets uploaded to my cloud storage, but then it doesn't get uploaded to BigQuery.运行云调度程序后,数据会上传到我的云存储,但不会上传到 BigQuery。

Below are my code and JSON data:以下是我的代码和 JSON 数据:

# function 1 triggered by http
def function(request):
    url = "https://api...."
    headers = {"Content-Type" : "application/json",
            "Authorization" : "..."}
        
    response = requests.get(url, headers=headers)

    json_data = response.json()
    pretty_json = json.dumps(json_data, indent=4, sort_keys=True)

    storage_client = storage.Client()
    bucket = storage_client.bucket("bucket_name")
    blob = bucket.blob("blob_name")

    blob.upload_from_string(pretty_json)
# function 2 triggered by cloud storage -> event type finalize/create
def function_2(data, context):
    client = bigquery.Client()

    table_id = "booming-post-322920:dataset_name.table_name"

    job_config = bigquery.LoadJobConfig()
    job_config.schema=[
        bigquery.SchemaField("order_items", "INTEGER"),
        bigquery.SchemaField("created_at", "TIMESTAMP"),
        .....,     
        bigquery.SchemaField("updated_at", "TIMESTAMP")
    ]

    job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON

    uri = 'gs://bucket_name/blob_name' 

    load_job = client.load_table_from_uri(
        uri,
        table_id,
        location="US",  
        job_config=job_config
    ) 

    load_job.result()  

This is what my JSON data pretty_json looks like:这是我的 JSON 数据pretty_json样子:

{
    "records": [
        {
            "active": null,
            "approved": null,
            "buyer": [
                1
            ],
            "cancel_reason": null,
            "cancelled": null,
            "chef": [
                1
            ],
            "completed": null,
            "created_at": "2021-07-15T17:44:31.064Z",
            ...

Please advise.请指教。

I think the main problem is the format of your JSON file: you are specifying newline delimited JSON format ( bigquery.SourceFormat.NEWLINE_DELIMITED_JSON ) as required by BigQuery, but your JSON doesn't conform to that format.我认为主要问题是您的 JSON 文件的格式:您正在按照 BigQuery 的要求指定换行符分隔的 JSON 格式( bigquery.SourceFormat.NEWLINE_DELIMITED_JSON ),但您的 JSON 不符合该格式。

Please, consider the following modifications to your first function:请考虑对您的第一个 function 进行以下修改:

def function(request):
    url = "https://api...."
    headers = {"Content-Type" : "application/json",
            "Authorization" : "..."}
        
    response = requests.get(url, headers=headers)

    json_data = response.json()
    
    records = [json.dumps(record) for record in json_data["records"]]
    records_data = "\n".join(records)

    storage_client = storage.Client()
    bucket = storage_client.bucket("bucket_name")
    blob = bucket.blob("blob_name")

    blob.upload_from_string(records_data)

Your JSON will look like the following now:您的 JSON 现在将如下所示:

{"active": null, "approved": null, "buyer": [1], "cancel_reason": null, "cancelled": null, "chef": [1], "completed": null, "created_at": "2021-07-15T17:44:31.064Z", "delivery": false, "delivery_address": null, "delivery_fee": null, "delivery_instructions": null, "discount": 0, "id": 1, "name": "Oak's Order", "notes": null, "order_delivery_time": null, "order_id": null, "order_ready_time": null, "order_submitted_time": null, "paid": null, "pickup_address": "", "promo_applied": null, "promo_code": null, "rated": null, "ratings": null, "review": null, "seller": [1], "status": "In Process", "tax": null, "tip": 0, "total": null, "type": "Pick Up", "updated_at": "2021-07-15T17:44:31.064Z"}
{"active": null, "approved": null, "buyer": [2], "cancel_reason": null, "cancelled": null, "chef": [1], "completed": null, "created_at": "2021-07-15T17:52:53.729Z", "delivery": false, "delivery_address": null, "delivery_fee": null, "delivery_instructions": null, "discount": 0, "id": 2, "name": "Shuu's Order", "notes": null, "order_delivery_time": null, "order_id": null, "order_ready_time": null, "order_submitted_time": null, "paid": null, "pickup_address": "", "promo_applied": null, "promo_code": null, "rated": null, "ratings": null, "review": null, "seller": [1], "status": "In Process", "tax": null, "tip": 0, "total": null, "type": "Pick Up", "updated_at": "2021-07-15T17:52:53.729Z"}

In addition, in your second function, as also pointed out by @CaioT in his/her comment, you need to change your function signature to accept two arguments, event and context , according to the GCS storage trigger event definition .此外,在你的第二个 function 中,正如@CaioT 在他/她的评论中指出的那样,你需要根据GCS 存储触发事件定义更改你的 function 签名以接受两个 arguments, eventcontext

In addition, please, consider review the definition of the order_items fields in the BigQuery schema definition, according to your JSON that field not exists.此外,请考虑查看 BigQuery 架构定义中order_items字段的定义,根据您的 JSON 该字段不存在。

Pay attention to the limitations imposed by BigQuery when importing JSON data as well, especially when dealing with timestamps.在导入 JSON 数据时也要注意 BigQuery 施加的限制,尤其是在处理时间戳时。

Finally, be sure your function has the necessary permissions to interact with BigQuery.最后,确保您的 function 具有与 BigQuery 交互所需的权限。

By default , at runtime your function will assume your App Engine service account although you can provide a specific service account as well. 默认情况下,在运行时您的 function 将采用您的 App Engine 服务帐户,尽管您也可以提供特定的服务帐户 Be sure that in any case the service account has the necessary permissions over BigQuery and your BigQuery table.请确保服务帐户在任何情况下都对 BigQuery 和您的 BigQuery 表具有必要的权限 Basically your service account must be bigquery.user and be WRITER (or equivalently, bigquery.dataEditor ) of your dataset.基本上,您的服务帐户必须是bigquery.user并且是数据集的WRITER (或等效的bigquery.dataEditor )。 Please, see the examples provided in the GCP documentation .请参阅GCP 文档中提供的示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM