简体   繁体   中英

why Snowflake changing the order of JSON values when converting into flatten list?

I have JSON objects stored in the table and I am trying to write a query to get the first element from that JSON.

Replication Script

create table staging.par.test_json (id int, val varchar(2000)); 

insert into staging.par.test_json values (1, '{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}');
insert into staging.par.test_json values (2,'
  {
    "list": [
      {
        "element": "Wholesale jeweler"
      },
      {
        "element": "Fashion"
      },
      {
        "element": "Industry"
      },
      {
        "element": "Jewelry store"
      },
      {
        "element": "Business service"
      },
      {
        "element": "Corporate office"
      }
    ]
  }');



with cte_get_cats AS
(
select id, 
       val as category_list 
       from staging.par.test_json
),
cats_parse AS
(
  select id,
         parse_json(category_list) as c
  from cte_get_cats
),
distinct_cats as
(
  select id,
         INDEX,
         UPPER(cast(value:element AS varchar)) As c
  from 
      cats_parse,
      LATERAL flatten(INPUT => c:"list")
  order by 1,2 
) ,
cat_array AS
    (
        SELECT  
            id,
            array_agg(DISTINCT c) AS sds_categories
        FROM
            distinct_cats
        GROUP BY 1
    ),
sds_cats AS
( 
         select id,
         cast(sds_categories[0] AS varchar) as sds_primary_category
         from cat_array
)
select * from sds_cats;

Values: Categories

{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}

Flattening it to a list gives me

["Plumber","Craft","Plumbing","Electrics","Electrical","Tradesperson","Home services","Housekeepings","Electrical Goods"]

Issue: The order of this is not always same. Snowflake seems to change the ordering sometimes snowflake changes the order as per the alphabet. How can I make this static. I do not want the order to be changed.

The problem is the way you're using ARRAY_AGG :

        array_agg(DISTINCT c) AS sds_categories

Specifying it like that gives Snowflake no guidelines on how the contents of array should be arranged. You should not assume that the arrays will be created in the same order as their input records - it might, but it's not guaranteed. So you probably want to do

        array_agg(DISTINCT c) within group (order by index) AS sds_categories

But that won't work, as if you use DISTINCT c , the value of index for each c is unknown. Perhaps you don't need DISTINCT , then this will work

        array_agg(c) within group (order by index) AS sds_categories

If you do need DISTINCT , you need to somehow associate an index with a distinct c value. One way is to use a MIN function on index in the input. Here's a full query

with cte_get_cats AS
(
select id, 
       val as category_list 
       from staging.par.test_json
),
cats_parse AS
(
  select id,
         parse_json(category_list) as c
  from cte_get_cats
),
distinct_cats as
(
  select id,
         MIN(INDEX) AS index,
         UPPER(cast(value:element AS varchar)) As c
  from 
      cats_parse,
      LATERAL flatten(INPUT => c:"list")
  group by 1,3 
) ,
cat_array AS
    (
        SELECT  
            id,
            array_agg(c) within group (order by index) AS sds_categories
        FROM
            distinct_cats
        GROUP BY 1
    ),
sds_cats AS
( 
         select id,
         cast(sds_categories[0] AS varchar) as sds_primary_category
         from cat_array
)
select * from cat_array;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM