简体   繁体   English

如何在 PostgreSQL 中的 arrays 上按元素应用聚合函数,例如对一组加权数组求和?

[英]How can I apply aggregate functions element-wise over arrays in PostgreSQL, e.g. weighted array sums over a group?

I have a table such as the following (see db<>fiddle ):我有一个如下表(见db<>fiddle ):

grp grp n n vals瓦尔斯
0 0 2 2 {1,2,3,4} {1,2,3,4}
1 1 5 5 {3,2,1,2} {3,2,1,2}
1 1 3 3 {0,5,4,3} {0,5,4,3}

where for each group (defined by grp ) I would like to perform some arithmetic involving the group's scalars n and arrays vals .对于每个组(由grp定义),我想执行一些涉及组的标量n和 arrays vals的算术。 I'm interested in a kind of weighted sum, such that each row's vals are multiplied by its n , and the resulting arrays are summed element-wise within each group, outputting one array per group:我对一种加权和感兴趣,这样每一行的 val 都乘以它的n ,并且得到的vals在每组中按元素求和,每组输出一个数组:

grp grp result结果
0 0 {2,4,6,8} {2,4,6,8}
1 1 {15,25,17,19} {15,25,17,19}

Here's what I've tried.这是我尝试过的。 This fails with an error ( aggregate function calls cannot contain set-returning function calls ):这失败并出现错误( aggregate function calls cannot contain set-returning function calls ):

SELECT
    grp,
    ARRAY(SELECT SUM(n * UNNEST(vals)))
FROM
    tbl
GROUP BY
    grp

The error includes a hint, but I am unable to make sense of it for my use case.该错误包含一个提示,但我无法理解我的用例。

The following sums the desired arrays down to scalars:下面将所需的 arrays 汇总为标量:

SELECT
    grp,
    SUM(n * vals[i])
FROM
    tbl,
    generate_series(1, 4) i
GROUP BY
    grp

Only this sort of works:只有这种作品:

SELECT
    grp,
    SUM(n * vals[1]),
    SUM(n * vals[2]),
    SUM(n * vals[3]),
    SUM(n * vals[4])
FROM
    tbl
GROUP BY
    grp

but it doesn't result in an array, and it involves writing out each element of the array separately.但它不会产生一个数组,它涉及分别写出数组的每个元素。 In my case the arrays are much longer than four elements, so this is too awkward.在我的情况下,arrays 比四个元素长得多,所以这太尴尬了。

WITH flattened AS (
    SELECT grp, position, SUM(val * n) AS s
    FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position)
    GROUP BY grp, position
    ORDER BY grp, position
)
SELECT grp, array_agg(s ORDER BY position)
FROM flattened
GROUP BY grp
;

+---+-------------------------------------------------------------------------------------+
|grp|array_agg                                                                            |
+---+-------------------------------------------------------------------------------------+
|0  |{2.00000000000000000,4.00000000000000000,6.00000000000000000,8.00000000000000000}    |
|1  |{15.00000000000000000,25.00000000000000000,17.00000000000000000,19.00000000000000000}|
+---+-------------------------------------------------------------------------------------+

Explanation:解释:

You can use UNNEST... WITH ORDINALITY to keep track of the position of each value:您可以使用UNNEST... WITH ORDINALITY来跟踪每个值的 position:

SELECT grp, position, val, n
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position);

+---+--------+---+-+
|grp|position|val|n|
+---+--------+---+-+
|0  |1       |1  |2|
|0  |2       |2  |2|
|0  |3       |3  |2|
|0  |4       |4  |2|
|1  |1       |3  |5|
|1  |2       |2  |5|
|1  |3       |1  |5|
|1  |4       |2  |5|
|1  |1       |0  |3|
|1  |2       |5  |3|
|1  |3       |4  |3|
|1  |4       |3  |3|
+---+--------+---+-+

Then GROUP BY the original group and each position:然后GROUP BY原始组和每个 position:

SELECT grp, position, SUM(val * n) AS s
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position)
GROUP BY grp, position
ORDER BY grp, position;

+---+--------+--+
|grp|position|s |
+---+--------+--+
|0  |1       |2 |
|0  |2       |4 |
|0  |3       |6 |
|0  |4       |8 |
|1  |1       |15|
|1  |2       |25|
|1  |3       |17|
|1  |4       |19|
+---+--------+--+

Then you only need the ARRAY_AGG as in the answer.那么你只需要答案中的ARRAY_AGG

I would write functions for that, otherwise the SQL will get really messy.我会为此编写函数,否则 SQL 会变得非常混乱。

One function to multiply all elements with a given value:一个 function 将所有元素与给定值相乘:

create function array_mul(p_input real[], p_mul int)
  returns real[]
as
$$
  select array(select i * p_mul
               from unnest(p_input) with ordinality as t(i,idx)
               order by idx);
$$
language sql
immutable;

And one function to be used as an aggregate that sums up the elements with the same index:还有一个 function 用作汇总具有相同索引的元素的聚合:

create or replace function array_add(p_one real[], p_two real[])
  returns real[]
as
$$
declare
  l_idx int;
  l_result real[];
begin
  if p_one is null or p_two is null then
    return coalesce(p_one, p_two);
  end if;
  
  for l_idx in 1..greatest(cardinality(p_one), cardinality(p_two)) loop
    l_result[l_idx] := coalesce(p_one[l_idx],0) + coalesce(p_two[l_idx], 0);
  end loop;
  
  return l_result;  
end;  
$$
language plpgsql
immutable;

That can be used to define a custom aggregate:这可用于定义自定义聚合:

create aggregate array_element_sum(real[]) (
  sfunc = array_add,
  stype = real[],
  initcond = '{}'
);

And then your query is as simple as:然后你的查询很简单:

select grp, array_element_sum(array_mul(vals, n))
from tbl
group by grp;

Online example 在线示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM