简体   繁体   English

将 JSON 数组加载到 BigQuery 中

[英]Loading JSON Array Into BigQuery

I am trying to load a json array into a bigquery table.我正在尝试将 json 数组加载到 bigquery 表中。 The structure of the data is as stated below:数据结构如下:

[{"image":"testimage1","component":"component1"},{"image":"testimage2","component":"component2"}]

Each json record corresponds to 1 row in BigQuery.每条 json 记录对应 BigQuery 中的 1 行。 The columns in BigQuery are: image and component. BigQuery 中的列是:图像和组件。 When I am trying to ingest the data it fails with a parsing error.当我尝试摄取数据时,它因解析错误而失败。 If I try to change the structure to this, it works如果我尝试将结构更改为此,它会起作用

 {"image":"testimage1","component":"component1"}{"image":"testimage2","component":"component2"}

I am trying to ingest as NEWLINE_DELIMITED_JSON Is there any way I can make the first json structure get ingested into Bigquery?我正在尝试以NEWLINE_DELIMITED_JSON的形式摄取 有什么方法可以让第一个 json 结构被摄取到 Bigquery 中吗?

No, only a valid JSON can be ingested by BigQuery and a valid JSON doesn't start by an array.不,BigQuery 只能提取有效的 JSON,而有效的 JSON 不以数组开头。

You have to transform it slightly:你必须稍微改变它:

  • Either transform it in a valid JSON (add a {"object": at the beginning and finish the line by a } ).将其转换为有效的 JSON(在开头添加一个{"object":并以}结束该行)。 Ingest that JSON in a temporary table and perform a query to scan the new table and insert the correct values in the target tables在临时表中提取 JSON 并执行查询以扫描新表并将正确的值插入目标表中
  • Or remove the array definition [] and replace the },{ by }\n{ to have a JSON line.或者删除数组定义[]并将},{替换为}\n{以获得 JSON 行。

Alternatively, you can ingest your JSON as a CSV file (you will have only 1 column with you JSON raw text in it) and then use the BigQuery String function to transform the data and insert them in the target database.或者,您可以将 JSON 作为 CSV 文件提取(其中只有 1 列 JSON 原始文本),然后使用 BigQuery 字符串 function 转换数据并将它们插入目标数据库。

You can follow this approach of looping through the list and writing it into a json file;您可以按照这种循环遍历列表并将其写入 json 文件的方法; then load the json file into BigQuery.然后将 json 文件加载到 BigQuery 中。

from google.cloud import bigquery
from google.oauth2 import service_account
import json

client = bigquery.Client(project="project-id")

dataset_id = "dataset-id"
table_id = "bqjson"


list_dict =[{"image":"testimage1","component":"component1"},{"image":"testimage2","component":"component2"}]


with open ("sample-json-data.json", "w") as jsonwrite:
   for item in list_dict:
       jsonwrite.write(json.dumps(item) + '\n')     #newline delimited json file


dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)


job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = True

with open("sample-json-data.json", "rb") as source_file:
   job = client.load_table_from_file(
       source_file,
       table_ref,
       location="us",  # Must match the destination dataset location.
       job_config=job_config,
   )  # API request

job.result()  # Waits for table load to complete.

print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))

Output: Output:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM