簡體   English   中英

用於檢測PostgreSQL中趨勢的聚合函數

[英]Aggregate function to detect trend in PostgreSQL

我正在使用psql DB來存儲數據結構,如下所示:

datapoint(userId, rank, timestamp)

其中timestamp是Unix Epoch毫秒時間戳。

在這個結構中,我每天存儲每個用戶的等級,所以它就像:

UserId   Rank  Timestamp
1        1     1435366459
1        2     1435366458
1        3     1435366457
2        8     1435366456
2        6     1435366455
2        7     1435366454

因此,在上面的示例數據中,userId 1通過每次測量改進了它的排名,這意味着它具有正趨勢,而userId 2在排名中下降,這意味着它具有負趨勢。

我需要做的是根據最后N次測量檢測所有具有正趨勢的用戶。

一種方法是對每個用戶的等級執行線性回歸,並檢查斜率是正還是負。 幸運的是,PostgreSQL有一個內置函數來做到這一點 - regr_slope

SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
FROM     my_table
GROUP BY user_id

此查詢為您提供基本功能。 現在,如果您願意,可以使用case表達式來裝飾它:

SELECT user_id, 
       CASE WHEN slope > 0 THEN 'positive' 
            WHEN slope < 0 THEN 'negative' 
            ELSE 'steady' END AS trend
FROM   (SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
        FROM     my_table
        GROUP BY user_id) t

編輯:
不幸的是, regr_slope沒有內置的方法來處理“前N”類型的要求,所以這應該單獨處理,例如,通過帶有row_number的子查詢:

-- Decoration outer query
SELECT user_id, 
       CASE WHEN slope > 0 THEN 'positive' 
            WHEN slope < 0 THEN 'negative' 
            ELSE 'steady' END AS trend
FROM   (-- Inner query to calculate the slope
        SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
        FROM     (-- Inner query to get top N
                  SELECT user_id, rank1, 
                         ROW_NUMER() OVER (PARTITION BY user_id 
                                           ORDER BY timestamp1 DESC) AS rn
                  FROM   my_table) t
        WHERE    rn <= N -- Replace N with the number of rows you need
        GROUP BY user_id) t2

您可以使用分析函數。 總體方法:

  • 使用lag()計算先前的排名
  • 用例來判斷趨勢是否為正(0或1)
  • 使用min()獲得前N行的最小趨勢; 如果N行的趨勢為正,則返回1,否則為0.要將其限制為N行,請使用窗口函數between N preceding and 0 following子句

碼:

select v2.*,
  min(positive_trend) over (partition by userid order by timestamp1
                             rows between 3 preceding and 0 following) as trend_overall
from (
  select v1.*,
    (case when prev_rank < rank1 then 0 else 1 end) as positive_trend
  from (
    select userid,
      rank1,
      timestamp1,
      lag(rank1) over (partition by userid order by timestamp1) as prev_rank
    from t1
    order by userid, timestamp1
  ) v1
) v2

SQL小提琴

UPDATE

要僅獲取具有整體趨勢和排名增量的用戶標識,您必須添加另一個調用lag(.., N+1)以獲取第n個先前排名和row_number()以獲取其中的編號相同的用戶ID:

select v3.userid, v3.trend_overall, delta_rank
from (  
  select v2.*,
    min(positive_trend) over (partition by userid order by timestamp1
                               rows between 3 preceding and 0 following) as trend_overall,
    latest_rank - prev_N_rank as delta_rank
  from (
    select v1.*,
      (case when prev_rank < rank1 then 0 else 1 end) as positive_trend,
      max(case when v1.rn = 1 then rank1 else NULL end) over (partition by userid) as latest_rank
    from (
      select userid,
        rank1,
        timestamp1,
        lag(rank1) over (partition by userid order by timestamp1) as prev_rank,
        lag(rank1, 4) over (partition by userid order by timestamp1) as prev_N_rank,
        row_number() over (partition by userid order by timestamp1 desc) as rn
      from t1
      order by userid, timestamp1
    ) v1
  ) v2
) v3 
where rn = 1
group by userid, trend_overall, delta_rank
order by userid, trend_overall, delta_rank

更新了SQL小提琴

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM