简体   繁体   English

在 Big Query 中以动态方式将 JSON 拆分为列

[英]Split JSON into columns in a dynamic way in Big Query

I have the following JSON:我有以下 JSON:

{
    "rewards": {
        "reward_1": {
            "type": "type 1",
            "amount": "amount 1"
        },
        "reward_2": {
            "type": "type 2",
            "amount": "amount 2"
        },
        "reward_3": {
            "type": "type 3",
            "amount": "amount 3"
        },
        "reward_4": {
            "type": "type 4",
            "amount": "amount 4"
        }
    }
}

This JSON is dynamic and I don't necessarily know how many rewards it will get, here it's 4 but it can be 2 or 8 etc.这个 JSON 是动态的,我不一定知道它会得到多少奖励,这里是 4,但它可以是 2 或 8 等等。

I want to write a query in Big Query that will parse those values dynamically without knowing how many of them exist, and then split them into column, like this:我想在 Big Query 中编写一个查询,它将动态解析这些值,而不知道它们中有多少存在,然后将它们分成列,如下所示:

在此处输入图像描述 Thank you!谢谢!

Hope these are helpful.希望这些是有帮助的。

  1. since a JSON data is dynamic, first step is to find a max reward sequence.由于 JSON 数据是动态的,因此第一步是找到最大奖励序列。 (I've used a regular expression and max_reward UDF.) (我使用了正则表达式和max_reward UDF。)
  2. and then, extract each reward from a json rewards field in an iterative way.然后,以迭代的方式从 json rewards字段中提取每个奖励。
  3. lastly, make the result to be a wide form using PIVOT query.最后,使用PIVOT查询使结果为宽格式。

If you want a more generic solution, you need to use BigQuery dynamic SQL to generate PIVOT columns.如果您想要更通用的解决方案,则需要使用 BigQuery 动态 SQL 来生成 PIVOT 列。 I've hard-coded them in the query.我在查询中对它们进行了硬编码。

('reward_1', 'reward_2', 'reward_3', 'reward_4')

query:询问:

CREATE TEMP TABLE sample AS 
SELECT 1 AS id, '{"rewards": { "reward_1": { ... ' AS json -- put your json here
 UNION ALL
SELECT 2 AS id, '{"rewards": { "reward_1": { ... ' AS json -- put your another json here
;

CREATE TEMP FUNCTION extract_reward(json STRING, seq INT64)
RETURNS STRUCT<type STRING, amount STRING>
LANGUAGE js AS """
  return JSON.parse(json)['reward_' + seq];
""";

CREATE TEMP FUNCTION max_reward(arr ARRAY<STRING>) AS ((
  SELECT MAX(CAST(v AS INT64)) FROM UNNEST(arr) v
));

SELECT * FROM (
  SELECT id,
         'reward_' || seq AS reward, 
         extract_reward(FORMAT('%t', JSON_QUERY(json, '$.rewards')), seq) AS value
    FROM sample, UNNEST(GENERATE_ARRAY(1, max_reward(REGEXP_EXTRACT_ALL(json, r'"reward_([0-9]+)"')))) seq
)  PIVOT (ANY_VALUE(value) FOR reward IN ('reward_1', 'reward_2', 'reward_3', 'reward_4'));

output:输出:

在此处输入图像描述

▶ Split a reward STRUCT column into separate columns ▶ 将reward STRUCT 列拆分为单独的列

SELECT * FROM (
  SELECT id,
         'reward_' || seq || '_' || IF (offset = 0, 'type', 'amount') AS reward,
         value
    FROM sample,
         UNNEST(GENERATE_ARRAY(1, max_reward(REGEXP_EXTRACT_ALL(json, r'"reward_([0-9]+)"')))) seq, 
         UNNEST([extract_reward(FORMAT('%t', JSON_QUERY(json, '$.rewards')), seq)]) pair,
         UNNEST([pair.type, pair.amount]) value WITH OFFSET
)  PIVOT (ANY_VALUE(value) FOR reward IN ('reward_1_type', 'reward_2_type', 'reward_3_type', 'reward_4_type', 'reward_1_amount', 'reward_2_amount', 'reward_3_amount', 'reward_4_amount'));

output:输出:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM