用於檢測PostgreSQL中趨勢的聚合函數

Question

我正在使用psql DB來存儲數據結構，如下所示：

datapoint(userId, rank, timestamp)

其中timestamp是Unix Epoch毫秒時間戳。

在這個結構中，我每天存儲每個用戶的等級，所以它就像：

UserId   Rank  Timestamp
1        1     1435366459
1        2     1435366458
1        3     1435366457
2        8     1435366456
2        6     1435366455
2        7     1435366454

因此，在上面的示例數據中，userId 1通過每次測量改進了它的排名，這意味着它具有正趨勢，而userId 2在排名中下降，這意味着它具有負趨勢。

我需要做的是根據最后N次測量檢測所有具有正趨勢的用戶。

Answer 1

一種方法是對每個用戶的等級執行線性回歸，並檢查斜率是正還是負。 幸運的是，PostgreSQL有一個內置函數來做到這一點 - regr_slope ：

SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
FROM     my_table
GROUP BY user_id

此查詢為您提供基本功能。 現在，如果您願意，可以使用case表達式來裝飾它：

SELECT user_id, 
       CASE WHEN slope > 0 THEN 'positive' 
            WHEN slope < 0 THEN 'negative' 
            ELSE 'steady' END AS trend
FROM   (SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
        FROM     my_table
        GROUP BY user_id) t

編輯：
不幸的是， regr_slope沒有內置的方法來處理“前N”類型的要求，所以這應該單獨處理，例如，通過帶有row_number的子查詢：

-- Decoration outer query
SELECT user_id, 
       CASE WHEN slope > 0 THEN 'positive' 
            WHEN slope < 0 THEN 'negative' 
            ELSE 'steady' END AS trend
FROM   (-- Inner query to calculate the slope
        SELECT   user_id, regr_slope (rank1, timestamp1) AS slope
        FROM     (-- Inner query to get top N
                  SELECT user_id, rank1, 
                         ROW_NUMER() OVER (PARTITION BY user_id 
                                           ORDER BY timestamp1 DESC) AS rn
                  FROM   my_table) t
        WHERE    rn <= N -- Replace N with the number of rows you need
        GROUP BY user_id) t2

Answer 2

您可以使用分析函數。 總體方法：

使用lag（）計算先前的排名
用例來判斷趨勢是否為正（0或1）
使用min（）獲得前N行的最小趨勢; 如果N行的趨勢為正，則返回1，否則為0.要將其限制為N行，請使用窗口函數between N preceding and 0 following子句

碼：

select v2.*,
  min(positive_trend) over (partition by userid order by timestamp1
                             rows between 3 preceding and 0 following) as trend_overall
from (
  select v1.*,
    (case when prev_rank < rank1 then 0 else 1 end) as positive_trend
  from (
    select userid,
      rank1,
      timestamp1,
      lag(rank1) over (partition by userid order by timestamp1) as prev_rank
    from t1
    order by userid, timestamp1
  ) v1
) v2

SQL小提琴

UPDATE

要僅獲取具有整體趨勢和排名增量的用戶標識，您必須添加另一個調用lag(.., N+1)以獲取第n個先前排名和row_number()以獲取其中的編號相同的用戶ID：

select v3.userid, v3.trend_overall, delta_rank
from (  
  select v2.*,
    min(positive_trend) over (partition by userid order by timestamp1
                               rows between 3 preceding and 0 following) as trend_overall,
    latest_rank - prev_N_rank as delta_rank
  from (
    select v1.*,
      (case when prev_rank < rank1 then 0 else 1 end) as positive_trend,
      max(case when v1.rn = 1 then rank1 else NULL end) over (partition by userid) as latest_rank
    from (
      select userid,
        rank1,
        timestamp1,
        lag(rank1) over (partition by userid order by timestamp1) as prev_rank,
        lag(rank1, 4) over (partition by userid order by timestamp1) as prev_N_rank,
        row_number() over (partition by userid order by timestamp1 desc) as rn
      from t1
      order by userid, timestamp1
    ) v1
  ) v2
) v3 
where rn = 1
group by userid, trend_overall, delta_rank
order by userid, trend_overall, delta_rank

更新了SQL小提琴

用於檢測PostgreSQL中趨勢的聚合函數

問題描述

2 個解決方案

解決方案1
6 已采納 2014-02-26 11:38:35

解決方案2
3 2014-02-26 12:13:46

用於檢測PostgreSQL中趨勢的聚合函數

問題描述

2 個解決方案

解決方案1 6 已采納 2014-02-26 11:38:35

解決方案2 3 2014-02-26 12:13:46

解決方案1
6 已采納 2014-02-26 11:38:35

解決方案2
3 2014-02-26 12:13:46