[英]Google BigQuery SQL: Parse multi level JSON (list +json+list+json) to Columns
[英]Google BigQuery SQL: Extract data from JSON (list and array) into columns
我有 json 字符串的表
UserID json_string
100 [{"id": 77379513, "value": "35.4566", "os_type": null, "amount": "200", "created_at": "2020-08-
16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "same'}]
100 [{"id": 77379514, "value": "38.658", "os_type": null, "amount": "100", "created_at": "2020-08-
16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "niko'}]
100 [{"id": 77379515, "value": "40.569", "os_type": null, "amount": "150", "created_at": "2020-08-
16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "koko'}]
200 [{"id": 77378899, "value": "25.365", "os_type": null, "amount": "100", "created_at": "2020-08-
16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "same'}]
200 [{"id": 77378900, "value": "35.898", "os_type": null, "amount": "500", "created_at": "2020-08-
16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "niko'}]
200 [{"id": 77378901, "value": "41.258", "os_type": null, "amount": "400", "created_at": "2020-08-
16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "koko'}]
最后,我需要将字符串转换为列:
UserID ID value os_type amount created_at updated_at Type_name
100 77379513 35.4566 null 200 2020-08-16T14:48:27.611-04:00 2020-08-16T14:48:27.611-04:00 same
100 77379514 38.658 null 100 2020-08-16T14:48:27.611-04:00 2020-08-16T14:48:27.611-04:01 niko
100 77379515 40.569 null 150 2020-08-16T14:48:27.611-04:00 2020-08-16T14:48:27.611-04:02 koko
200 77378899 25.365 null 100 2020-09-16T14:48:27.611-04:01 2020-08-17T14:48:27.611-04:03 same
200 77378900 35.898 null 500 2020-09-16T14:48:27.611-04:02 2020-08-17T14:48:27.611-04:04 niko
200 77378901 41.258 null 400 2020-09-16T14:48:27.611-04:03 2020-08-17T14:48:27.611-04:05 koko
首先,我尝试从列表中提取 JSON:
SELECT iUserID,json_extract_array(json_string) as json_array
FROM `project.dataset.table`
然后我得到一张这样的桌子:
UserID json_array
100 {"id": 77379513, "value": "35.4566", "os_type": null, "amount": "200", "created_at": "2020-08-
16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "same'}
100 {"id": 77379514, "value": "38.658", "os_type": null, "amount": "100", "created_at": "2020-08-
16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "niko'}
100 {"id": 77379515, "value": "40.569", "os_type": null, "amount": "150", "created_at": "2020-08-
16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "koko'}
200 {"id": 77378899, "value": "25.365", "os_type": null, "amount": "100", "created_at": "2020-09-
16T14:48:27.611-04:00", "updated_at": "2020-08-17T14:48:27.836-04:00", "Type_name": "same'}
200 {"id": 77378900, "value": "35.898", "os_type": null, "amount": "500", "created_at": "2020-09-
16T14:48:27.611-04:00", "updated_at": "2020-08-17T14:48:27.836-04:00", "Type_name": "niko'}
200 {"id": 77378901, "value": "41.258", "os_type": null, "amount": "400", "created_at": "2020-09-
16T14:48:27.611-04:00", "updated_at": "2020-08-17T14:48:27.836-04:00", "Type_name": "koko'}
从这一步开始,我尝试使用 function JSON_EXTRACT_SCALAR,但我收到一个错误,即这个 function 不适用于数组。 那么将数据提取到列的正确方法是什么?
以下将为您工作
select UserID,
json_extract_scalar(json, '$.id') as id,
json_extract_scalar(json, '$.value') as value,
json_extract_scalar(json, '$.os_type') as os_type,
json_extract_scalar(json, '$.amount') as amount,
json_extract_scalar(json, '$.created_at') as created_at,
json_extract_scalar(json, '$.updated_at') as updated_at,
json_extract_scalar(json, '$.Type_name') as Type_name
from `project.dataset.table`,
unnest(json_extract_array(json_string, '$')) json
如果适用于您问题中的示例数据
with `project.dataset.table` as (
select 100 UserID, '[{"id": 77379513, "value": "35.4566", "os_type": null, "amount": "200", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "same"}]' json_string union all
select 100, '[{"id": 77379514, "value": "38.658", "os_type": null, "amount": "100", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "niko"}]' union all
select 100, '[{"id": 77379515, "value": "40.569", "os_type": null, "amount": "150", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "koko"}]' union all
select 200, '[{"id": 77378899, "value": "25.365", "os_type": null, "amount": "100", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "same"}]' union all
select 200, '[{"id": 77378900, "value": "35.898", "os_type": null, "amount": "500", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "niko"}]' union all
select 200, '[{"id": 77378901, "value": "41.258", "os_type": null, "amount": "400", "created_at": "2020-08-16T14:48:27.611-04:00", "updated_at": "2020-08-16T14:48:27.836-04:00", "Type_name": "koko"}]'
)
output 是
注意:您在少数地方使用了'
而不是"
所以这在上面使用的示例数据中是“固定的”
如果您无法控制此表中的值并且无法将'
固定为"
您可以使用下面的代替
select UserID,
json_extract_scalar(json, '$.id') as id,
json_extract_scalar(json, '$.value') as value,
json_extract_scalar(json, '$.os_type') as os_type,
json_extract_scalar(json, '$.amount') as amount,
json_extract_scalar(json, '$.created_at') as created_at,
json_extract_scalar(json, '$.updated_at') as updated_at,
json_extract_scalar(json, '$.Type_name') as Type_name
from `project.dataset.table`,
unnest(json_extract_array(replace(json_string, "'", '"'), '$')) json
请注意unnest
内部的更改,它使用'
处理该问题
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.