[英]Presto unnest/map complex json
我已經在堆棧上待了幾個小時,探索其他 presto unnest/map/cast 解決方案的示例,但我似乎找不到適合我的數據的解決方案。
這是我的數據示例:
with test_data (id, messy_json) AS (
VALUES ('TEST_A', JSON '{"issue":[],"problem":[{"category":"math","id":2,"name":"subtraction"},{"category":"math","id":3,"name":"division"},{"category":"english","id":25,"name":"verbs"},{"category":"english","id":27,"name":"grammar"},{"category":"language","id":1,"name":"grammar"}],"version":4}'),
('TEST_B', JSON '{"problem":[],"version":4}'),
('TEST_C', JSON '{"version": 4, "problem": [], "issue": [null, null, null, null, null, null, null, null, null, null, null]}')
),
JSON 列是半非結構化的,可以容納多個 lvls / 並不總是像其他行一樣具有每個鍵:值對。
我正在嘗試解決方案,例如:
with test_data AS (
select id,
messy_json
from larger_tbl),
select
id as id,
json_extract_scalar(test_data, '$.version') as lvl1_version
json_extract_scalar(lvl2, '$.problem') as lvl2_id
from test
LEFT JOIN UNNEST(CAST(json_parse(messy_json) AS array(json))) AS x(lvl1) ON TRUE
LEFT JOIN UNNEST(CAST(json_extract(lvl1, '$.problem') AS array(json))) AS y(lvl2) ON TRUE
這讓我出現演員錯誤等。我嘗試了一些變化
unnest(cast(json_col as map(varchar, map(varchar,varchar)) options too.
我的目標是使用保留的 ID 和保留在長數據集中的所有鍵/多級鍵來分解整個數據集。 我感謝任何輸入/指導,謝謝!
老實說,根據提供的示例數據和規定的目標,您當前的查詢沒有多大意義。
UNNEST
用於cross join
而不是left join
(我使用簡潔的語法完全跳過關鍵字)json_parse(messy_json)
在您的測試數據中,因為它已經是 json (盡管我假設在真實數據中它只是一個 varchar 字段,那么它應該被保留)problem
json_extract_scalar(lvl2, '$.problem') as lvl2_id
應更改為id
,因為problem
數組已經取消嵌套-- sample data
with test_data (id, messy_json) AS (
VALUES ('TEST_A', JSON '{"issue":[],"problem":[{"category":"math","id":2,"name":"subtraction"},{"category":"math","id":3,"name":"division"},{"category":"english","id":25,"name":"verbs"},{"category":"english","id":27,"name":"grammar"},{"category":"language","id":1,"name":"grammar"}],"version":4}'),
('TEST_B', JSON '{"problem":[],"version":4}'),
('TEST_C', JSON '{"version": 4, "problem": [], "issue": [null, null, null, null, null, null, null, null, null, null, null]}')
)
-- query
select id
, json_extract_scalar(messy_json, '$.version') as lvl1_version
, json_extract_scalar(lvl2, '$.id') as lvl2_id
from test_data
, UNNEST(CAST(json_extract(messy_json, '$.problem')
AS array(json)), array[1]) AS y(lvl2, ignored);
Output:
ID | lvl1_version | lvl2_id |
---|---|---|
測試_A | 4 | 2 |
測試_A | 4 | 3 |
測試_A | 4 | 25 |
測試_A | 4 | 27 |
測試_A | 4 | 1 |
測試_B | 4 | NULL |
測試_C | 4 | NULL |
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.