简体   繁体   English

表的累积总和

[英]Cumulative sum over a table

What is the best way to perform a cumulative sum over a table in Postgres, in a way that can bring the best performance and flexibility in case more fields / columns are added to the table. 在Postgres中对表执行累积和的最佳方法是什么,以便在将更多字段/列添加到表中时可以带来最佳性能和灵活性。

Table

a   b   d
1   59  15  181
2       16  268
3           219
4           102

Cumulative 累积的

a   b   d
1   59  15  181
2       31  449
3           668
4           770

Window functions for running sum. 用于运行总和的窗口函数。

SELECT sum(a) OVER (ORDER BY d) as "a",
       sum(b) OVER (ORDER BY d) as "b",
       sum(d) OVER (ORDER BY d) as "d" 
FROM table;

If you have more than one running sum, make sure the orders are the same. 如果您有多个运行总和,请确保订单相同。


It's important to note that if you want your columns to appear as the aggregate table in your question (each field uniquely ordered), it'd be a little more involved. 重要的是要注意,如果您希望您的列在您的问题中显示为聚合表(每个字段唯一排序),那么它会更复杂一些。


Update: I've modified the query to do the required sorting, without a given common field. 更新:我已经修改了查询以执行所需的排序,没有给定的公共字段。

SQL Fiddle: (1) Only Aggregates , or (2) Source Data Beside Running Sum SQL小提琴: (1)仅聚合 ,或(2)运行总和旁边的源数据

WITH 
rcd AS ( 
  select row_number() OVER() as num,a,b,d 
  from tbl
),
sorted_a AS (
  select row_number() OVER(w1) as num, sum(a) over(w2) a
  from tbl
  window w1 as (order by a nulls last),
         w2 as (order by a nulls first)
),
sorted_b AS (
  select row_number() OVER(w1) as num, sum(b) over(w2) b
  from tbl
  window w1 as (order by b nulls last),
         w2 as (order by b nulls first)
),
sorted_d AS (
  select row_number() OVER(w1) as num, sum(d) over(w2) d
  from tbl
  window w1 as (order by d nulls last),
         w2 as (order by d nulls first)
)

SELECT sorted_a.a, sorted_b.b, sorted_d.d 
FROM rcd 
JOIN sorted_a USING(num)
JOIN sorted_b USING(num)
JOIN sorted_d USING(num)
ORDER BY num;

You can use window functions, but you need additional logic to avoid values where there are NULL s: 您可以使用窗口函数,但是需要额外的逻辑来避免存在NULL的值:

SELECT id,
       (case when a is not null then sum(a) OVER (ORDER BY id) end) as a,
       (case when b is not null then sum(b) OVER (ORDER BY id) end) as b,
       (case when d is not null then sum(d) OVER (ORDER BY id) end) as d 
FROM table;

This assumes that the first column that specifies the ordering is called id . 这假定指定排序的第一列称为id

I think what you are really looking for is this: 我认为你真正想要的是:

SELECT id
     , sum(a) OVER (PARTITION BY a_grp ORDER BY id) as a
     , sum(b) OVER (PARTITION BY b_grp ORDER BY id) as b
     , sum(d) OVER (PARTITION BY d_grp ORDER BY id) as d 
FROM  (
   SELECT *
        , count(a IS NULL OR NULL) OVER (ORDER BY id) as a_grp
        , count(b IS NULL OR NULL) OVER (ORDER BY id) as b_grp
        , count(d IS NULL OR NULL) OVER (ORDER BY id) as d_grp
   FROM   tbl
   ) sub
ORDER  BY id;

The expression count(col IS NULL OR NULL) OVER (ORDER BY id) forms groups of consecutive non-null rows for a , b and d in the subquery sub . 表达式count(col IS NULL OR NULL) OVER (ORDER BY id)在子查询subabd形成连续的非空行组。

In the outer query we run cumulative sums per group . 在外部查询中,我们运行每组的累积总和。 NULL values form their own group and stay NULL automatically. NULL值形成自己的组并自动保持NULL No additional CASE statement necessary. 无需额外的CASE声明。

SQL Fiddle (with some added values for column a to demonstrate the effect). SQL Fiddle (列a一些附加值用于演示效果)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM