[英]I want to extract Json format data with BigQuery. UDF or json_extract
I have a table with the following structure.我有一个具有以下结构的表。
user_id int,用户 ID 整数,
purchase_ids string(in Json format) purchase_ids string(Json格式)
The JSON contained in one record in this table looks like this:此表中的一条记录中包含的 JSON 如下所示:
user_id = 0001 user_id = 0001
1:{
shop_id:1,
product_id :1111,
value: 1
},
2:{
shop_id:1,
product_id :2222,
value: 1
},
3:{
shop_id:1,
product_id :3333,
value: 1
},
.... Numbers fluctuate as records approach
Final output to aim for最终输出目标
| user_id | shop_id | product_id | value |
| 0001 | 1 | 1111 | 1 |
| 0001 | 1 | 2222 | 1 |
| 0001 | 1 | 3333 | 1 |
I tried the following query when I was thinking but it doesn't seem to be done right shop_id and product_id return null.我在思考时尝试了以下查询,但似乎没有正确完成 shop_id 和 product_id 返回 null。
CREATE TEMP FUNCTION jsonparse(json_row STRING)
RETURNS STRING
LANGUAGE js AS """
var res = array();
json_row.forEach(([key, value]) => {
res = value;
});
return res
""";
with
parse as(
select
user_id,
jsonparse(purchase_ids) as pids
from
sample
)
select
user_id,
JSON_EXTRAXT(pid,"$.shop_id") as shop_id,
JSON_EXTRAXT(pid,"$.product_id") as product_id
from
parse,
unnest(pids,",") pid
How do you get it right in this situation?在这种情况下,您如何正确处理?
From my point of view, your use case needs to use a NESTED and REAPEATED column that can be represented with a json structure.从我的角度来看,您的用例需要使用可以用 json 结构表示 的 NESTED 和 REAPEATED 列。 For example, the following query return the result you are looking for:
例如,以下查询返回您要查找的结果:
WITH users AS
(SELECT "0001" as user_id, ARRAY<STRUCT<shop_id INT64, product_id INT64, value INT64>>[(1, 1111,1),
(1, 2222,1), (1, 3333,1)] AS shops)
SELECT u.user_id, s.*
FROM users u, UNNEST(shops) s;
For simplicity you can create this type of column from the Console to try this approach by following this guide .为简单起见,您可以从控制台创建这种类型的列,按照本指南尝试使用这种方法。
Below is the working version of your use case (BigQuery Standard SQL)以下是您的用例的工作版本(BigQuery 标准 SQL)
#standardSQL
CREATE TEMP FUNCTION jsonparse(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
WITH sample AS (
SELECT "0001" AS user_id,
'''[{"shop_id": 1, "product_id" :1111, "value": 1},
{"shop_id": 1, "product_id" :2222, "value": 1},
{"shop_id": 1, "product_id" :3333, "value": 1}]''' AS purchase_ids
), parse AS (
SELECT user_id,
jsonparse(purchase_ids) AS pids
FROM sample
)
SELECT
user_id,
JSON_EXTRACT(pid,"$.shop_id") AS shop_id,
JSON_EXTRACT(pid,"$.product_id") AS product_id,
JSON_EXTRACT(pid,"$.value") AS value
FROM parse,
UNNEST(pids) pid
with result结果
Row user_id shop_id product_id value
1 0001 1 1111 1
2 0001 1 2222 1
3 0001 1 3333 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.