[英]Aggregate function to detect trend in PostgreSQL
我正在使用psql DB來存儲數據結構,如下所示:
datapoint(userId, rank, timestamp)
其中timestamp是Unix Epoch毫秒時間戳。
在這個結構中,我每天存儲每個用戶的等級,所以它就像:
UserId Rank Timestamp
1 1 1435366459
1 2 1435366458
1 3 1435366457
2 8 1435366456
2 6 1435366455
2 7 1435366454
因此,在上面的示例數據中,userId 1通過每次測量改進了它的排名,這意味着它具有正趨勢,而userId 2在排名中下降,這意味着它具有負趨勢。
我需要做的是根據最后N次測量檢測所有具有正趨勢的用戶。
一種方法是對每個用戶的等級執行線性回歸,並檢查斜率是正還是負。 幸運的是,PostgreSQL有一個內置函數來做到這一點 - regr_slope
:
SELECT user_id, regr_slope (rank1, timestamp1) AS slope
FROM my_table
GROUP BY user_id
此查詢為您提供基本功能。 現在,如果您願意,可以使用case
表達式來裝飾它:
SELECT user_id,
CASE WHEN slope > 0 THEN 'positive'
WHEN slope < 0 THEN 'negative'
ELSE 'steady' END AS trend
FROM (SELECT user_id, regr_slope (rank1, timestamp1) AS slope
FROM my_table
GROUP BY user_id) t
編輯:
不幸的是, regr_slope
沒有內置的方法來處理“前N”類型的要求,所以這應該單獨處理,例如,通過帶有row_number
的子查詢:
-- Decoration outer query
SELECT user_id,
CASE WHEN slope > 0 THEN 'positive'
WHEN slope < 0 THEN 'negative'
ELSE 'steady' END AS trend
FROM (-- Inner query to calculate the slope
SELECT user_id, regr_slope (rank1, timestamp1) AS slope
FROM (-- Inner query to get top N
SELECT user_id, rank1,
ROW_NUMER() OVER (PARTITION BY user_id
ORDER BY timestamp1 DESC) AS rn
FROM my_table) t
WHERE rn <= N -- Replace N with the number of rows you need
GROUP BY user_id) t2
您可以使用分析函數。 總體方法:
between N preceding and 0 following
子句 碼:
select v2.*,
min(positive_trend) over (partition by userid order by timestamp1
rows between 3 preceding and 0 following) as trend_overall
from (
select v1.*,
(case when prev_rank < rank1 then 0 else 1 end) as positive_trend
from (
select userid,
rank1,
timestamp1,
lag(rank1) over (partition by userid order by timestamp1) as prev_rank
from t1
order by userid, timestamp1
) v1
) v2
UPDATE
要僅獲取具有整體趨勢和排名增量的用戶標識,您必須添加另一個調用lag(.., N+1)
以獲取第n個先前排名和row_number()
以獲取其中的編號相同的用戶ID:
select v3.userid, v3.trend_overall, delta_rank
from (
select v2.*,
min(positive_trend) over (partition by userid order by timestamp1
rows between 3 preceding and 0 following) as trend_overall,
latest_rank - prev_N_rank as delta_rank
from (
select v1.*,
(case when prev_rank < rank1 then 0 else 1 end) as positive_trend,
max(case when v1.rn = 1 then rank1 else NULL end) over (partition by userid) as latest_rank
from (
select userid,
rank1,
timestamp1,
lag(rank1) over (partition by userid order by timestamp1) as prev_rank,
lag(rank1, 4) over (partition by userid order by timestamp1) as prev_N_rank,
row_number() over (partition by userid order by timestamp1 desc) as rn
from t1
order by userid, timestamp1
) v1
) v2
) v3
where rn = 1
group by userid, trend_overall, delta_rank
order by userid, trend_overall, delta_rank
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.