[英]Rolling Aggregation
我正在嘗試在 SQL Server 中編寫一個基於滾動日期聚合的程序。
下面拿這個
Acc Dte Amount
1 1/1/20 100
1 1/3/20 200
1 1/8/20 100
1 1/8/20 75
2 1/1/20 50
2 1/2/20 100
2 1/3/20 75
2 1/3/20 125
3 1/3/20 100
3 1/6/20 75
3 1/8/20 75
3 1/10/20 200
3 1/10/20 150
所以目標是我想在被分析的記錄之前找到每個帳戶的記錄和日期的平均值和計數。 我還需要根據日期對記錄求和 所以根據上面它看起來像這樣......
Acc Dte Num_of_dates Avg_Amount_per_day Current_Amount
1 1/3/20 1 100 200
1 1/8/20 2 150 175
2 1/2/20 1 50 100
2 1/3/20 2 75 200
3 1/6/20 1 100 75
3 1/8/20 2 83.3 75
3 1/10/20 3 83.3 350
目標是創建一個 z 分數,將當天的賬戶數量與每天的賬戶平均值進行比較。 但是我們還需要為每個帳戶至少提供 10 天的歷史數據。
現在我的代碼看起來像這樣並且不起作用
select Account,
Dte,
(select sum(case when Cast(EventTimestamp as DATE) < Dte then 1 else 0 end) Num_of_Date,
(select (case when Cast(EventTimestamp as DATE) < Dte then sum(Amount) else 0 end) t_amount
from Data
group by Account, Dte
有任何想法嗎? 謝謝
您可以使用帶有適當rows
子句的窗口函數。 這一次, distinct
在這里派上用場:
select distinct
acc,
dte,
count(*) over(
partition by acc
order by dte
rows between unbounded preceding and 1 preceding
) num_of_dates,
avg(1.0 * amount) over(
partition by acc
order by dte
rows between unbounded preceding and 1 preceding
) avg_amount_per_day,
sum(amount) over(partition by acc, dte) current_amount
from mytable
如果您確實希望每個日期和帳戶只需要一條記錄,如示例數據所示,您可以嵌套查詢並使用row_number()
- 在沒有明顯的列來定義排序順序的情況下,我依賴於累積計數:
select acc, dte, num_of_dates, avg_amount_per_day, current_amount
from (
select
t.*,
row_number() over(partition by acc, dte order by num_of_dates) rn
from (
select
acc,
dte,
count(*) over(
partition by acc
order by dte
rows between unbounded preceding and 1 preceding
) num_of_dates,
avg(1.0 * amount) over(
partition by acc
order by dte
rows between unbounded preceding and 1 preceding
) avg_amount_per_day,
sum(amount) over(partition by acc, dte) current_amount
from mytable
) t
) t
where rn = 1 and avg_amount_per_day is not null
acc | dte | num_of_dates | avg_amount_per_day | current_amount --: | :--------- | -----------: | :----------------- | -------------: 1 | 2020-01-03 | 1 | 100.000000 | 200 1 | 2020-01-08 | 2 | 150.000000 | 175 2 | 2020-01-02 | 1 | 50.000000 | 100 2 | 2020-01-03 | 2 | 75.000000 | 200 3 | 2020-01-06 | 1 | 100.000000 | 75 3 | 2020-01-08 | 2 | 87.500000 | 75 3 | 2020-01-10 | 3 | 83.333333 | 350
您的樣本數據和描述表明:
select acc, dte,
count(*) as num_on_day,
sum(amount) as sum_on_day,
avg(sum(amount)) over (partition by acc order by date_num range between unbounded preceding and 1 preceding) as avg_previous
from t cross join
(values (datediff(day, '1900-01-01', dte))) v(date_num)
group by acc, dte;
我不確定你為什么不包括每個acc
的第一個日期。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.