[英]Loading JSON Array Into BigQuery
I am trying to load a json array into a bigquery table.我正在尝试将 json 数组加载到 bigquery 表中。 The structure of the data is as stated below:
数据结构如下:
[{"image":"testimage1","component":"component1"},{"image":"testimage2","component":"component2"}]
Each json record corresponds to 1 row in BigQuery.每条 json 记录对应 BigQuery 中的 1 行。 The columns in BigQuery are: image and component.
BigQuery 中的列是:图像和组件。 When I am trying to ingest the data it fails with a parsing error.
当我尝试摄取数据时,它因解析错误而失败。 If I try to change the structure to this, it works
如果我尝试将结构更改为此,它会起作用
{"image":"testimage1","component":"component1"}{"image":"testimage2","component":"component2"}
I am trying to ingest as NEWLINE_DELIMITED_JSON
Is there any way I can make the first json structure get ingested into Bigquery?我正在尝试以
NEWLINE_DELIMITED_JSON
的形式摄取 有什么方法可以让第一个 json 结构被摄取到 Bigquery 中吗?
No, only a valid JSON can be ingested by BigQuery and a valid JSON doesn't start by an array.不,BigQuery 只能提取有效的 JSON,而有效的 JSON 不以数组开头。
You have to transform it slightly:你必须稍微改变它:
{"object":
at the beginning and finish the line by a }
).{"object":
并以}
结束该行)。 Ingest that JSON in a temporary table and perform a query to scan the new table and insert the correct values in the target tables[]
and replace the },{
by }\n{
to have a JSON line.[]
并将},{
替换为}\n{
以获得 JSON 行。 Alternatively, you can ingest your JSON as a CSV file (you will have only 1 column with you JSON raw text in it) and then use the BigQuery String function to transform the data and insert them in the target database.或者,您可以将 JSON 作为 CSV 文件提取(其中只有 1 列 JSON 原始文本),然后使用 BigQuery 字符串 function 转换数据并将它们插入目标数据库。
You can follow this approach of looping through the list and writing it into a json file;您可以按照这种循环遍历列表并将其写入 json 文件的方法; then load the json file into BigQuery.
然后将 json 文件加载到 BigQuery 中。
from google.cloud import bigquery
from google.oauth2 import service_account
import json
client = bigquery.Client(project="project-id")
dataset_id = "dataset-id"
table_id = "bqjson"
list_dict =[{"image":"testimage1","component":"component1"},{"image":"testimage2","component":"component2"}]
with open ("sample-json-data.json", "w") as jsonwrite:
for item in list_dict:
jsonwrite.write(json.dumps(item) + '\n') #newline delimited json file
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = True
with open("sample-json-data.json", "rb") as source_file:
job = client.load_table_from_file(
source_file,
table_ref,
location="us", # Must match the destination dataset location.
job_config=job_config,
) # API request
job.result() # Waits for table load to complete.
print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))
Output: Output:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.