简体   繁体   中英

Postgres: aggregation function that returns a column

In postgres, when I call a function on some data, like so:

select f(col_nums) from tbl_name
where col_str = '12345'

then function f will be applied on each row where col_str = '12345' .

On the other hand, if I call an aggregation function on some data, like so:

select g_agg(col_nums) from tbl_name
where col_str = '12345'

then the function g_agg will be called on the the entire column but will result in a single value.

Q: How can I make a function that will be applied on the entire column and return a column of the same size while at the same time being aware of all the values in the the subset?

For example, can I create a function to calculate cumulative sum?

select *, sum_accum(col_nums) as cs from tbl_name
where col_str = '12345'

such that the result of the above query would look like this:

 col_str | more_cols | col_numbers | cs
---------+-----------+-------------+----
  12345  |    567    |     1       |  1
  12345  |    568    |     2       |  3
  12345  |    569    |     3       |  6
  12345  |    570    |     4       | 10

Is there no choice but to pass a sub-query result to a function and then join with the original table?

Use window functions

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

eg

select *, sum(col_nums) OVER(PARTITION BY T.COLX, T.COLY) as cs 
from tbl_name T
where col_str = '12345'

Note that it is the addition on a over clause that changes an aggregate from its traditional use to a window function :

the OVER clause causes it to be treated as a window function and computed across an appropriate set of rows

In the over clause has a partition by (analogous to group by ) which controls the window that the calculations are performed in; and it also allows an order by which is valid for some functions but not all.

select *
   -- running sum using an order by
 , sum(col_nums) OVER(PARTITION BY T.COLX ORDER BY T.COLY) as cs 

   -- but count does not permit ordering
 , count(*) OVER(PARTITION BY T.COLX) as cs_count
from tbl_name T
where col_str = '12345'

The function that you want is a cumulative sum. This is handled by window functions:

select t.*, sum(col_nums) over (order by more_cols) as cs
from tbl_name t
where col_str = '12345';

I am guessing that the order by sequence is defined by the second column. It can be any column including col_nums .

You can do this for all values of col_str at the same time, using the partition by clause:

select t.*, sum(col_nums) over (partition by col_str order by more_cols) as cs
from tbl_name t

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM