简体   繁体   中英

How to count distinct value with partition by and order by in Snowflake sql?

My data is as follows:

| user | eventorder| postal|
|:---- |:---------:| -----:|
| A    | 1         | 60616 |
| A    | 2         | 10000 |
| A    | 3         | 60616 |
| B    | 1         | 20000 |
| B    | 2         | 30000 |
| B    | 3         | 40000 |
| B    | 4         | 30000 |
| B    | 5         | 20000 |

The problem I need to solve: how many distinct stops until each event order that user has travelled?

The ideal result should be as follows:

| user | eventorder| postal| travelledStop|
|:---- |:---------:| -----:| ------------:|
| A    | 1         | 60616 |  1    |
| A    | 2         | 10000 |  2    |
| A    | 3         | 60616 |  2    |
| B    | 1         | 20000 |  1    |
| B    | 2         | 30000 |  2    |
| B    | 3         | 40000 |  3    |
| B    | 4         | 30000 |  3    |
| B    | 5         | 20000 |  3    |

Take A as an example, when event order is 1, it only travelled 60616 - 1 stop. When event order is 2, it has travelled 60616 and 10000 - 2 stops. When event order is 3, the distinct stops this user has travelled are 60616 and 10000. - 2 stops.

I am not allowed to use count distinct with partition by order by. I want to do something like count(distinct(postal)) over (partition by user order by eventorder), but it is not allowed.

Does anyone know how to solve this? Thanks a lot!

I used the sample data you provided (a subset of just A, but this should scale out). The goal here is to essentially generate an array for each row that accumulates all the postals for the previous events.

with _temp as (
select 'A' as usr, 1 as EventOrder, '60616' as Postal
UNION ALL
select 'A' as usr, 2 as EventOrder, '10000' as Postal
UNION ALL
select 'A' as usr, 3 as EventOrder, '60616' as Postal
),
_intermediate as (
select usr
    , eventorder
    , postal
    , array_slice(
          array_agg(postal)
            within group (order by eventorder)
            OVER (Partition by usr)
           , 0, eventorder) as full_array
from _temp
group by usr, eventorder, postal
)
select usr, eventorder, postal, count(distinct f.value)
from _intermediate i, lateral flatten(input => i.full_array) f
group by usr, eventorder, postal

Perhaps the simplest method is to use a subquery and count the "1"s:

select t.*,
       sum(case when seqnum = 1 then 1 else 0 end) over (partition by usr order by eventorder) as num_postals
from (select t.*,
             row_number() over (partition by usr, postal order by eventorder) as seqnum
      from t
     ) t

I like @Daniel Zagales answer but here is a work-around by using dense_rank and sum

with temp as (
select 'A' as usr, 1 as EventOrder, '60616' as Postal
UNION ALL
select 'A' as usr, 2 as EventOrder, '10000' as Postal
UNION ALL
select 'A' as usr, 3 as EventOrder, '60616' as Postal  
UNION ALL
select 'B' as usr, 1 as EventOrder, '20000' as Postal  
UNION ALL
select 'B' as usr, 2 as EventOrder, '30000' as Postal  
UNION ALL
select 'B' as usr, 3 as EventOrder, '40000' as Postal 
UNION ALL
select 'B' as usr, 4 as EventOrder, '30000' as Postal  
UNION ALL
select 'B' as usr, 5 as EventOrder, '20000' as Postal 
),
temp2 as(
select temp.* ,dense_rank()over(partition by usr,Postal order by EventOrder) rks
from temp 
)
select usr,eventorder,postal,sum(case when rks = 1 then 1 else 0 END)over(partition by usr order by EventOrder) travelledStop
from temp2 
order by usr,EventOrder 

basically use dense_rank to get first appear stop than sum up.

db<>fiddle

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM