簡體   English   中英

如何在 Snowflake sql 中通過分區和排序來計算不同的值?

[英]How to count distinct value with partition by and order by in Snowflake sql?

我的數據如下:

| user | eventorder| postal|
|:---- |:---------:| -----:|
| A    | 1         | 60616 |
| A    | 2         | 10000 |
| A    | 3         | 60616 |
| B    | 1         | 20000 |
| B    | 2         | 30000 |
| B    | 3         | 40000 |
| B    | 4         | 30000 |
| B    | 5         | 20000 |

我需要解決的問題:在用戶旅行的每個事件訂單之前有多少不同的停靠點?

理想的結果應該如下:

| user | eventorder| postal| travelledStop|
|:---- |:---------:| -----:| ------------:|
| A    | 1         | 60616 |  1    |
| A    | 2         | 10000 |  2    |
| A    | 3         | 60616 |  2    |
| B    | 1         | 20000 |  1    |
| B    | 2         | 30000 |  2    |
| B    | 3         | 40000 |  3    |
| B    | 4         | 30000 |  3    |
| B    | 5         | 20000 |  3    |

以 A 為例,當事件順序為 1 時,它僅行進 60616 - 1 站。 當事件順序為 2 時,它已行駛 60616 和 10000 - 2 站。 當事件順序為 3 時,此用戶經過的不同站點是 60616 和 10000。 - 2 個站點。

我不允許將 count distinct 與 partition by order by 一起使用。 我想做一些類似 count(distinct(postal)) 的事情(按用戶順序按 eventorder 分區),但這是不允許的。

有誰知道如何解決這個問題? 非常感謝!

我使用了您提供的示例數據(只是 A 的一個子集,但這應該可以擴展)。 這里的目標本質上是為每一行生成一個數組,該數組累積了先前事件的所有郵政。

with _temp as (
select 'A' as usr, 1 as EventOrder, '60616' as Postal
UNION ALL
select 'A' as usr, 2 as EventOrder, '10000' as Postal
UNION ALL
select 'A' as usr, 3 as EventOrder, '60616' as Postal
),
_intermediate as (
select usr
    , eventorder
    , postal
    , array_slice(
          array_agg(postal)
            within group (order by eventorder)
            OVER (Partition by usr)
           , 0, eventorder) as full_array
from _temp
group by usr, eventorder, postal
)
select usr, eventorder, postal, count(distinct f.value)
from _intermediate i, lateral flatten(input => i.full_array) f
group by usr, eventorder, postal

也許最簡單的方法是使用子查詢並計算“1”:

select t.*,
       sum(case when seqnum = 1 then 1 else 0 end) over (partition by usr order by eventorder) as num_postals
from (select t.*,
             row_number() over (partition by usr, postal order by eventorder) as seqnum
      from t
     ) t

我喜歡@Daniel Zagales 的回答,但這里是使用dense_ranksum的解決方法

with temp as (
select 'A' as usr, 1 as EventOrder, '60616' as Postal
UNION ALL
select 'A' as usr, 2 as EventOrder, '10000' as Postal
UNION ALL
select 'A' as usr, 3 as EventOrder, '60616' as Postal  
UNION ALL
select 'B' as usr, 1 as EventOrder, '20000' as Postal  
UNION ALL
select 'B' as usr, 2 as EventOrder, '30000' as Postal  
UNION ALL
select 'B' as usr, 3 as EventOrder, '40000' as Postal 
UNION ALL
select 'B' as usr, 4 as EventOrder, '30000' as Postal  
UNION ALL
select 'B' as usr, 5 as EventOrder, '20000' as Postal 
),
temp2 as(
select temp.* ,dense_rank()over(partition by usr,Postal order by EventOrder) rks
from temp 
)
select usr,eventorder,postal,sum(case when rks = 1 then 1 else 0 END)over(partition by usr order by EventOrder) travelledStop
from temp2 
order by usr,EventOrder 

基本上使用dense_rank來獲得第一次出現的停止而不是總結。

db<>小提琴

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM