簡體   English   中英

JSON 上傳到 BigQuery

[英]JSON upload to BigQuery

我正在嘗試使用兩個成功部署的雲函數和一個成功運行的雲調度程序來自動將 JSON 數據上傳到 BigQuery。 運行雲調度程序后,數據會上傳到我的雲存儲,但不會上傳到 BigQuery。

以下是我的代碼和 JSON 數據:

# function 1 triggered by http
def function(request):
    url = "https://api...."
    headers = {"Content-Type" : "application/json",
            "Authorization" : "..."}
        
    response = requests.get(url, headers=headers)

    json_data = response.json()
    pretty_json = json.dumps(json_data, indent=4, sort_keys=True)

    storage_client = storage.Client()
    bucket = storage_client.bucket("bucket_name")
    blob = bucket.blob("blob_name")

    blob.upload_from_string(pretty_json)
# function 2 triggered by cloud storage -> event type finalize/create
def function_2(data, context):
    client = bigquery.Client()

    table_id = "booming-post-322920:dataset_name.table_name"

    job_config = bigquery.LoadJobConfig()
    job_config.schema=[
        bigquery.SchemaField("order_items", "INTEGER"),
        bigquery.SchemaField("created_at", "TIMESTAMP"),
        .....,     
        bigquery.SchemaField("updated_at", "TIMESTAMP")
    ]

    job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON

    uri = 'gs://bucket_name/blob_name' 

    load_job = client.load_table_from_uri(
        uri,
        table_id,
        location="US",  
        job_config=job_config
    ) 

    load_job.result()  

這是我的 JSON 數據pretty_json樣子:

{
    "records": [
        {
            "active": null,
            "approved": null,
            "buyer": [
                1
            ],
            "cancel_reason": null,
            "cancelled": null,
            "chef": [
                1
            ],
            "completed": null,
            "created_at": "2021-07-15T17:44:31.064Z",
            ...

請指教。

我認為主要問題是您的 JSON 文件的格式:您正在按照 BigQuery 的要求指定換行符分隔的 JSON 格式( bigquery.SourceFormat.NEWLINE_DELIMITED_JSON ),但您的 JSON 不符合該格式。

請考慮對您的第一個 function 進行以下修改:

def function(request):
    url = "https://api...."
    headers = {"Content-Type" : "application/json",
            "Authorization" : "..."}
        
    response = requests.get(url, headers=headers)

    json_data = response.json()
    
    records = [json.dumps(record) for record in json_data["records"]]
    records_data = "\n".join(records)

    storage_client = storage.Client()
    bucket = storage_client.bucket("bucket_name")
    blob = bucket.blob("blob_name")

    blob.upload_from_string(records_data)

您的 JSON 現在將如下所示:

{"active": null, "approved": null, "buyer": [1], "cancel_reason": null, "cancelled": null, "chef": [1], "completed": null, "created_at": "2021-07-15T17:44:31.064Z", "delivery": false, "delivery_address": null, "delivery_fee": null, "delivery_instructions": null, "discount": 0, "id": 1, "name": "Oak's Order", "notes": null, "order_delivery_time": null, "order_id": null, "order_ready_time": null, "order_submitted_time": null, "paid": null, "pickup_address": "", "promo_applied": null, "promo_code": null, "rated": null, "ratings": null, "review": null, "seller": [1], "status": "In Process", "tax": null, "tip": 0, "total": null, "type": "Pick Up", "updated_at": "2021-07-15T17:44:31.064Z"}
{"active": null, "approved": null, "buyer": [2], "cancel_reason": null, "cancelled": null, "chef": [1], "completed": null, "created_at": "2021-07-15T17:52:53.729Z", "delivery": false, "delivery_address": null, "delivery_fee": null, "delivery_instructions": null, "discount": 0, "id": 2, "name": "Shuu's Order", "notes": null, "order_delivery_time": null, "order_id": null, "order_ready_time": null, "order_submitted_time": null, "paid": null, "pickup_address": "", "promo_applied": null, "promo_code": null, "rated": null, "ratings": null, "review": null, "seller": [1], "status": "In Process", "tax": null, "tip": 0, "total": null, "type": "Pick Up", "updated_at": "2021-07-15T17:52:53.729Z"}

此外,在你的第二個 function 中,正如@CaioT 在他/她的評論中指出的那樣,你需要根據GCS 存儲觸發事件定義更改你的 function 簽名以接受兩個 arguments, eventcontext

此外,請考慮查看 BigQuery 架構定義中order_items字段的定義,根據您的 JSON 該字段不存在。

在導入 JSON 數據時也要注意 BigQuery 施加的限制,尤其是在處理時間戳時。

最后,確保您的 function 具有與 BigQuery 交互所需的權限。

默認情況下,在運行時您的 function 將采用您的 App Engine 服務帳戶,盡管您也可以提供特定的服務帳戶 請確保服務帳戶在任何情況下都對 BigQuery 和您的 BigQuery 表具有必要的權限 基本上,您的服務帳戶必須是bigquery.user並且是數據集的WRITER (或等效的bigquery.dataEditor )。 請參閱GCP 文檔中提供的示例。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM