[英]why Snowflake changing the order of JSON values when converting into flatten list?
我有JSON对象存储在表中,我正在尝试编写一个查询来获取该JSON中的第一个元素。
复制脚本
create table staging.par.test_json (id int, val varchar(2000));
insert into staging.par.test_json values (1, '{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}');
insert into staging.par.test_json values (2,'
{
"list": [
{
"element": "Wholesale jeweler"
},
{
"element": "Fashion"
},
{
"element": "Industry"
},
{
"element": "Jewelry store"
},
{
"element": "Business service"
},
{
"element": "Corporate office"
}
]
}');
with cte_get_cats AS
(
select id,
val as category_list
from staging.par.test_json
),
cats_parse AS
(
select id,
parse_json(category_list) as c
from cte_get_cats
),
distinct_cats as
(
select id,
INDEX,
UPPER(cast(value:element AS varchar)) As c
from
cats_parse,
LATERAL flatten(INPUT => c:"list")
order by 1,2
) ,
cat_array AS
(
SELECT
id,
array_agg(DISTINCT c) AS sds_categories
FROM
distinct_cats
GROUP BY 1
),
sds_cats AS
(
select id,
cast(sds_categories[0] AS varchar) as sds_primary_category
from cat_array
)
select * from sds_cats;
价值观:类别
{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}
将它展平到列表给了我
["Plumber","Craft","Plumbing","Electrics","Electrical","Tradesperson","Home services","Housekeepings","Electrical Goods"]
问题:这个顺序并不总是一样的。 Snowflake似乎改变了顺序,有时候雪花会按字母顺序改变顺序。 我怎样才能使这个静态。 我不希望订单被更改。
问题是你使用ARRAY_AGG
的方式:
array_agg(DISTINCT c) AS sds_categories
像这样指定它给Snowflake没有关于如何安排数组内容的指导。 你不应该假设阵列将在同一顺序作为其输入记录创建-这可能,但它不能保证。 所以你可能想做
array_agg(DISTINCT c) within group (order by index) AS sds_categories
但这不起作用,就好像你使用DISTINCT c
,每个c
的index
值都是未知的。 也许你不需要DISTINCT
,那么这将有效
array_agg(c) within group (order by index) AS sds_categories
如果确实需要DISTINCT
,则需要以某种方式将index
与不同的c
值相关联。 一种方法是在输入中对index
使用MIN
函数。 这是一个完整的查询
with cte_get_cats AS
(
select id,
val as category_list
from staging.par.test_json
),
cats_parse AS
(
select id,
parse_json(category_list) as c
from cte_get_cats
),
distinct_cats as
(
select id,
MIN(INDEX) AS index,
UPPER(cast(value:element AS varchar)) As c
from
cats_parse,
LATERAL flatten(INPUT => c:"list")
group by 1,3
) ,
cat_array AS
(
SELECT
id,
array_agg(c) within group (order by index) AS sds_categories
FROM
distinct_cats
GROUP BY 1
),
sds_cats AS
(
select id,
cast(sds_categories[0] AS varchar) as sds_primary_category
from cat_array
)
select * from cat_array;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.