簡體   English   中英

使用滯后窗口功能找到合適的分區

[英]Finding the right partition using the lag window function

我每天為來自不同行業的不同公司提供時間序列,並使用PostgreSQL。 我從一個例子開始,來解釋我的問題。 我所擁有的是:

+------------+---------+-------------+----+
|    day     | company | industry    | v  |
+------------+---------+-------------+----+
| 2012-01-12 | A       | consumer    | 2  |
| 2012-01-12 | B       | consumer    | 2  |
| 2012-01-12 | C       | health      | 4  |
| 2012-01-12 | D       | health      | 4  |
| 2012-01-13 | A       | consumer    | 5  |
| 2012-01-13 | B       | consumer    | 5  |
| 2012-01-13 | C       | health      | 7  |
| 2012-01-13 | D       | health      | 7  |
| 2012-01-16 | A       | consumer    | 8  |
| 2012-01-16 | B       | consumer    | 8  |
| 2012-01-16 | C       | health      | 3  |
| 2012-01-16 | D       | health      | 3  |
+------------+---------+-------------+----+

不同行業的公司不同,某些行業的日平均價值為v。 我需要的是:

+------------+---------+----------+---+------------+
|    day     | company | industry | v | delta_v    |
+------------+---------+----------+---+------------+
| 2012-01-12 | A       | consumer | 2 | NULL       |
| 2012-01-12 | B       | consumer | 2 | NULL       |
| 2012-01-12 | C       | health   | 4 | NULL       |
| 2012-01-12 | D       | health   | 4 | NULL       |
| 2012-01-13 | A       | consumer | 5 | 1.5        |
| 2012-01-13 | B       | consumer | 5 | 1.5        |
| 2012-01-13 | C       | health   | 7 | 0.75       |
| 2012-01-13 | D       | health   | 7 | 0.75       |
| 2012-01-16 | A       | consumer | 8 | 0.6        |
| 2012-01-16 | B       | consumer | 8 | 0.6        |
| 2012-01-16 | C       | health   | 3 | -0.571428  |
| 2012-01-16 | D       | health   | 3 | -0.571428  |
+------------+---------+----------+---+------------+

我需要變量v的每日變化。例如,行業“消費者”在2012年1月12日的平均值為2,在2012年1月13日為5。因此,增長率為(5-2)/ 2 = 1.5。

我嘗試了這個:

    SELECT * 
           , (v - LAG(v) OVER (PARTITION BY industry ORDER BY day) )
           / LAG (v) OVER (PARTITION BY industry ORDER BY day) AS delta_v
    FROM mytable
    ORDER BY day, industry

問題是,如果同一天有來自同一行業的多家公司,那么它也會“在幾天內”計算出價值的變化v。

我希望只需要在“ PARTITION BY”-子句中進行一些更正,但我真的不知道該怎么做。 您有什么想法可以幫助我嗎?

我認為您也希望公司也在那里:

SELECT t.*,
       ((v - LAG(v) OVER (PARTITION BY industry, company ORDER BY day) )
        / LAG (v) OVER (PARTITION BY industry, company ORDER BY day)
       ) AS delta_v
FROM mytable t
ORDER BY day, industry;

我不確定Postgres是否實際計算lag()兩次,但這更易於維護:

SELECT t.*,
       (v / LAG(v) OVER (PARTITION BY industry, company ORDER BY day) ) - 1
       ) AS delta_v
FROM mytable t
ORDER BY day, industry;

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM