[英]Finding the right partition using the lag window function
I have daily timeseries for different companies from different industries and work with PostgreSQL. 我每天为来自不同行业的不同公司提供时间序列,并使用PostgreSQL。 I start right with an example to explain my problem.
我从一个例子开始,来解释我的问题。 What I have is this:
我所拥有的是:
+------------+---------+-------------+----+
| day | company | industry | v |
+------------+---------+-------------+----+
| 2012-01-12 | A | consumer | 2 |
| 2012-01-12 | B | consumer | 2 |
| 2012-01-12 | C | health | 4 |
| 2012-01-12 | D | health | 4 |
| 2012-01-13 | A | consumer | 5 |
| 2012-01-13 | B | consumer | 5 |
| 2012-01-13 | C | health | 7 |
| 2012-01-13 | D | health | 7 |
| 2012-01-16 | A | consumer | 8 |
| 2012-01-16 | B | consumer | 8 |
| 2012-01-16 | C | health | 3 |
| 2012-01-16 | D | health | 3 |
+------------+---------+-------------+----+
There are different companies from different industries with some value v as daily average across industries. 不同行业的公司不同,某些行业的日平均价值为v。 What I would need is this:
我需要的是:
+------------+---------+----------+---+------------+
| day | company | industry | v | delta_v |
+------------+---------+----------+---+------------+
| 2012-01-12 | A | consumer | 2 | NULL |
| 2012-01-12 | B | consumer | 2 | NULL |
| 2012-01-12 | C | health | 4 | NULL |
| 2012-01-12 | D | health | 4 | NULL |
| 2012-01-13 | A | consumer | 5 | 1.5 |
| 2012-01-13 | B | consumer | 5 | 1.5 |
| 2012-01-13 | C | health | 7 | 0.75 |
| 2012-01-13 | D | health | 7 | 0.75 |
| 2012-01-16 | A | consumer | 8 | 0.6 |
| 2012-01-16 | B | consumer | 8 | 0.6 |
| 2012-01-16 | C | health | 3 | -0.571428 |
| 2012-01-16 | D | health | 3 | -0.571428 |
+------------+---------+----------+---+------------+
I need the daily change of variable v. For example the average value for v for industry "consumer" on 2012-01-12 is 2 and on 2012-01-13 it is 5. Thus the growth is (5-2)/2 = 1.5. 我需要变量v的每日变化。例如,行业“消费者”在2012年1月12日的平均值为2,在2012年1月13日为5。因此,增长率为(5-2)/ 2 = 1.5。
I tried this: 我尝试了这个:
SELECT *
, (v - LAG(v) OVER (PARTITION BY industry ORDER BY day) )
/ LAG (v) OVER (PARTITION BY industry ORDER BY day) AS delta_v
FROM mytable
ORDER BY day, industry
The problem is it computes the change in value v also "intra-days", if there is more than one company from the same industry on one day. 问题是,如果同一天有来自同一行业的多家公司,那么它也会“在几天内”计算出价值的变化v。
I hope it just needs a small correction in the "PARTITION BY" - clause, but I really can't figure out how to do it. 我希望只需要在“ PARTITION BY”-子句中进行一些更正,但我真的不知道该怎么做。 Do you have any ideas that can help me?
您有什么想法可以帮助我吗?
I think you want the company in there too: 我认为您也希望公司也在那里:
SELECT t.*,
((v - LAG(v) OVER (PARTITION BY industry, company ORDER BY day) )
/ LAG (v) OVER (PARTITION BY industry, company ORDER BY day)
) AS delta_v
FROM mytable t
ORDER BY day, industry;
I'm not sure if Postgres actually calculates the lag()
twice, but this is easier to maintain: 我不确定Postgres是否实际计算
lag()
两次,但这更易于维护:
SELECT t.*,
(v / LAG(v) OVER (PARTITION BY industry, company ORDER BY day) ) - 1
) AS delta_v
FROM mytable t
ORDER BY day, industry;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.