简体   繁体   中英

Snowflake function with Array as parameter fails with Unsupported subquery error

I have a transaction table and I need to group the similar records and for the class column which might be unique, I need to pick the top most value from the lookup table(class table) our of the values for the similar records ( loopup table is ordered based on the priority).

select * from class;

ID  NAME
2   BETA
6   OMEGA
5   SIGMA
1   ALPHA1
3   GAMMA
4   DELTA
CREATE OR REPLACE FUNCTION "MIN_VALUE"(classlist array)
RETURNS VARCHAR(200)
LANGUAGE SQL
AS '
    select NAME from CLASS 
    where ID in ( select min(ID) from CLASS 
                               where NAME in (select value from table(flatten(input=>classlist))))
';

select * from T_DATA;

C_ID    P_ID    D_ID    S_ID    CLASS
1101111 1404    564     1404    BETA
1101111 1404    599     1425    ALPHA
1101111 1404    564     1404    OMEGA
1101111 1404    564     1425    ALPHA
1101111 1404    564     1404    GAMMA
1101111 1404    564     1425    GAMMA
1101111 1404    599     1425    GAMMA
1101111 1404    564     1425    OMEGA

When I write a query like below it works FINE

select MIN_VALUE(array_construct('OMEGA','GAMMA','BETA'));

When I use it in the actual query it fails with SQL compilation error: Unsupported subquery type cannot be evaluated

select C_ID, P_ID, D_ID, S_ID, MIN_VALUE(class_array) from (
    select C_ID, P_ID, D_ID, S_ID, arrayagg(class) class_array
    from t_data 
    group by C_ID,P_ID,D_ID,S_ID
);

OR

select C_ID,P_ID,D_ID,S_ID,MIN_VALUE(ca) from (
    select C_ID,P_ID,D_ID,S_ID,array_construct(class_array) ca from (
        select C_ID,P_ID,D_ID,S_ID,arrayagg(class) class_array
        from t_data 
        group by C_ID,P_ID,D_ID,S_ID
    )
);

I am expecting an output like below from the 8 records above

select C_ID,P_ID,D_ID,S_ID,array_construct(class_array) ca from (
        select C_ID,P_ID,D_ID,S_ID,arrayagg(class) class_array
        from t_data 
        group by C_ID,P_ID,D_ID,S_ID
    );

Output

C_ID    P_ID    D_ID    S_ID    CLASS_ARRAY
1101111 1404    564     1404    ["OMEGA", "GAMMA", "BETA"]
1101111 1404    599     1425    ["ALPHA", "GAMMA"]
1101111 1404    564     1425    ["ALPHA", "GAMMA", "OMEGA"]

When I use the min_value function on the above class_array that will return a single value based on the priority in the lookup table.

C_ID    P_ID    D_ID    S_ID    CLASS_ARRAY
1101111 1404    564     1404    BETA
1101111 1404    599     1425    ALPHA
1101111 1404    564     1425    ALPHA

Please suggest some options to figure out why the function runs fine for hardcoded values but fails if the Array is constructed in the query and passed as parameter.

Snowflake has some limitations in its support for SQL statements that include some SELECT patterns within column definitions. There are a couple of ways to rewrite the query above to obtain the desired result:

1) Find the min ID and then join back to the class table:

with T as (  
  select C_ID, P_ID, D_ID, S_ID, min(class.id) minclassid
  from t_data join class
     on class.name = t_data.class
  group by C_ID,P_ID,D_ID,S_ID
)
select C_ID, P_ID, D_ID, S_ID, class.name
from T join CLASS on minclassid = class.id;

2) Or use a Windowing function to grab the first class name ordered by ID within a group:

select distinct C_ID, P_ID, D_ID, S_ID, 
   first_value(class.name) over 
     (partition by C_ID, P_ID, D_ID, S_ID order by class.id) name
from t_data join class
on class.name = t_data.class;

This can also be done using the QUALIFY filter which allows filtering after the select stage and having that filter logic not show in the results.

with class as (
    select * from values
      (2, 'BETA'),
      (6, 'OMEGA'),
      (5, 'SIGMA'),
      (1, 'ALPHA'),
      (3, 'GAMMA'),
      (4, 'DELTA')  
      v(id, name)
), t_data as (
    select * from values
      (1101111, 1404, 564, 1404, 'BETA'),
      (1101111, 1404, 599, 1425, 'ALPHA'),
      (1101111, 1404, 564, 1404, 'OMEGA'),
      (1101111, 1404, 564, 1425, 'ALPHA'),
      (1101111, 1404, 564, 1404, 'GAMMA'),
      (1101111, 1404, 564, 1425, 'GAMMA'),
      (1101111, 1404, 599, 1425, 'GAMMA'),
      (1101111, 1404, 564, 1425, 'OMEGA')
      v(C_ID, P_ID, D_ID, S_ID, CLASS)
)
select c_id, p_id, d_id, s_id, d.class
from t_data d
join class c on d.class = c.name
qualify row_number() over (partition by c_id, p_id, d_id, s_id order by c.id) = 1;

gives:

C_ID    P_ID    D_ID    S_ID    CLASS
1101111 1404    564     1404    BETA
1101111 1404    564     1425    ALPHA
1101111 1404    599     1425    ALPHA

which is the same as the more explicit/verbose form:

select c_id, p_id, d_id, s_id, class from (
    select c_id, p_id, d_id, s_id, d.class
        ,row_number() over (partition by c_id, p_id, d_id, s_id order by c.id) as rn
    from t_data d
    join class c on d.class = c.name
)
where rn = 1;

which is really the same mechanics as Stuart's DISTINCT

If you are really want to do it via array's you can order the array on construction with WITHIN GROUP (ORDER BY..) and then you can take the first object, but FIRST_VALUE or QUALIFY methods should be faster.. but if there are other reasons to keep the array this might help

select C_ID, P_ID, D_ID, S_ID, class_array[0] ca from (
    select C_ID, P_ID, D_ID, S_ID, arrayagg(class) within group (order by class.id) class_array
    from t_data
    join class on t_data.class = class.name
    group by C_ID,P_ID,D_ID,S_ID
);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM