I have a deeply nested JSON document that is variable length and has variable arrays respective to the document, I am looking to unnest certain sections and write them to BigQuery, and disregard others.
I was excited about Dataprep by Trifacta but as they will be accessing the data, this will not work for my company. We work with healthcare data and only have authorized Google.
Has anyone worked with other solutions in GCP to transform JSONs? The nature of the document is so long and nested that writing a custom Regex and running it on a pod before ingestion is taking significant compute.
You can try this:
[1] Flatten the JSON document using jq
:
cat source.json | jq -c '.[]' > target.json
[2] Load transformed JSON file (using autodetect
):
bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON mydataset.mytable target.json
Result:
BigQuery will automatically create RECORD (STRUCT) data type for nested data
Dataflow can also be useful for this purpose:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.