[英]How can I apply aggregate functions element-wise over arrays in PostgreSQL, e.g. weighted array sums over a group?
I have a table such as the following (see db<>fiddle ):我有一个如下表(见db<>fiddle ):
grp grp | n n | vals瓦尔斯 |
---|---|---|
0 0 | 2 2 | {1,2,3,4} {1,2,3,4} |
1 1 | 5 5 | {3,2,1,2} {3,2,1,2} |
1 1 | 3 3 | {0,5,4,3} {0,5,4,3} |
where for each group (defined by grp
) I would like to perform some arithmetic involving the group's scalars n
and arrays vals
.对于每个组(由grp
定义),我想执行一些涉及组的标量n
和 arrays vals
的算术。 I'm interested in a kind of weighted sum, such that each row's vals
are multiplied by its n
, and the resulting arrays are summed element-wise within each group, outputting one array per group:我对一种加权和感兴趣,这样每一行的 val 都乘以它的n
,并且得到的vals
在每组中按元素求和,每组输出一个数组:
grp grp | result结果 |
---|---|
0 0 | {2,4,6,8} {2,4,6,8} |
1 1 | {15,25,17,19} {15,25,17,19} |
Here's what I've tried.这是我尝试过的。 This fails with an error ( aggregate function calls cannot contain set-returning function calls
):这失败并出现错误( aggregate function calls cannot contain set-returning function calls
):
SELECT
grp,
ARRAY(SELECT SUM(n * UNNEST(vals)))
FROM
tbl
GROUP BY
grp
The error includes a hint, but I am unable to make sense of it for my use case.该错误包含一个提示,但我无法理解我的用例。
The following sums the desired arrays down to scalars:下面将所需的 arrays 汇总为标量:
SELECT
grp,
SUM(n * vals[i])
FROM
tbl,
generate_series(1, 4) i
GROUP BY
grp
Only this sort of works:只有这种作品:
SELECT
grp,
SUM(n * vals[1]),
SUM(n * vals[2]),
SUM(n * vals[3]),
SUM(n * vals[4])
FROM
tbl
GROUP BY
grp
but it doesn't result in an array, and it involves writing out each element of the array separately.但它不会产生一个数组,它涉及分别写出数组的每个元素。 In my case the arrays are much longer than four elements, so this is too awkward.在我的情况下,arrays 比四个元素长得多,所以这太尴尬了。
WITH flattened AS (
SELECT grp, position, SUM(val * n) AS s
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position)
GROUP BY grp, position
ORDER BY grp, position
)
SELECT grp, array_agg(s ORDER BY position)
FROM flattened
GROUP BY grp
;
+---+-------------------------------------------------------------------------------------+
|grp|array_agg |
+---+-------------------------------------------------------------------------------------+
|0 |{2.00000000000000000,4.00000000000000000,6.00000000000000000,8.00000000000000000} |
|1 |{15.00000000000000000,25.00000000000000000,17.00000000000000000,19.00000000000000000}|
+---+-------------------------------------------------------------------------------------+
Explanation:解释:
You can use UNNEST... WITH ORDINALITY
to keep track of the position of each value:您可以使用UNNEST... WITH ORDINALITY
来跟踪每个值的 position:
SELECT grp, position, val, n
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position);
+---+--------+---+-+
|grp|position|val|n|
+---+--------+---+-+
|0 |1 |1 |2|
|0 |2 |2 |2|
|0 |3 |3 |2|
|0 |4 |4 |2|
|1 |1 |3 |5|
|1 |2 |2 |5|
|1 |3 |1 |5|
|1 |4 |2 |5|
|1 |1 |0 |3|
|1 |2 |5 |3|
|1 |3 |4 |3|
|1 |4 |3 |3|
+---+--------+---+-+
Then GROUP BY
the original group and each position:然后GROUP BY
原始组和每个 position:
SELECT grp, position, SUM(val * n) AS s
FROM tbl, unnest(vals) WITH ORDINALITY AS f(val, position)
GROUP BY grp, position
ORDER BY grp, position;
+---+--------+--+
|grp|position|s |
+---+--------+--+
|0 |1 |2 |
|0 |2 |4 |
|0 |3 |6 |
|0 |4 |8 |
|1 |1 |15|
|1 |2 |25|
|1 |3 |17|
|1 |4 |19|
+---+--------+--+
Then you only need the ARRAY_AGG
as in the answer.那么你只需要答案中的ARRAY_AGG
。
I would write functions for that, otherwise the SQL will get really messy.我会为此编写函数,否则 SQL 会变得非常混乱。
One function to multiply all elements with a given value:一个 function 将所有元素与给定值相乘:
create function array_mul(p_input real[], p_mul int)
returns real[]
as
$$
select array(select i * p_mul
from unnest(p_input) with ordinality as t(i,idx)
order by idx);
$$
language sql
immutable;
And one function to be used as an aggregate that sums up the elements with the same index:还有一个 function 用作汇总具有相同索引的元素的聚合:
create or replace function array_add(p_one real[], p_two real[])
returns real[]
as
$$
declare
l_idx int;
l_result real[];
begin
if p_one is null or p_two is null then
return coalesce(p_one, p_two);
end if;
for l_idx in 1..greatest(cardinality(p_one), cardinality(p_two)) loop
l_result[l_idx] := coalesce(p_one[l_idx],0) + coalesce(p_two[l_idx], 0);
end loop;
return l_result;
end;
$$
language plpgsql
immutable;
That can be used to define a custom aggregate:这可用于定义自定义聚合:
create aggregate array_element_sum(real[]) (
sfunc = array_add,
stype = real[],
initcond = '{}'
);
And then your query is as simple as:然后你的查询很简单:
select grp, array_element_sum(array_mul(vals, n))
from tbl
group by grp;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.