简体   繁体   中英

Aggregate functions over arrays

I have a table like this:

+-----+----------------+
| ID  |  array300      |
+-----+----------------+
| 100 | {110,25,53,..} |
| 101 | {56,75,59,...} |
| 102 | {65,93,82,...} |
| 103 | {75,70,80,...} |
+-----+----------------+

array300 column is an array of 300 elements. I need to have arrays of 100 elements with every element representing the average of 3 elements of array300 . For this example the answer will be like:
array100
{62.66,...}
{63.33,...}
{80,...}
{78.33,...}

Try something like this:

SELECT id, unnest(array300) as val, ntile(100) OVER (PARTITION BY id) as bucket_num
FROM your_table

This SELECT will give you 300 records per array300 with same id and assing them the bucket_num (1 for firs 3 elements, 2 for next 3, and so on).

Then use this select to get the avg of elements in the bucket:

SELECT id, avg(val) as avg_val
FROM (...previous select here...)
GROUP BY id, bucket_num

Next - just aggregate the avg_val into array:

SELECT id, array_agg(avg_val) as array100
FROM (...previous select here...)
GROUP BY id

Details: unnest , ntile , array_agg , OVER (PARTITION BY )

UPD: Try this function:

CREATE OR REPLACE FUNCTION public.array300_to_100 (
  p_array300 numeric []
)
RETURNS numeric [] AS
$body$
DECLARE
  dim_start int = array_length(p_array300, 1); --size of input array
  dim_end int = 100; -- size of output array
  dim_step int = dim_start / dim_end; --avg batch size
  tmp_sum NUMERIC; --sum of the batch
  result_array NUMERIC[100]; -- resulting array
BEGIN

  FOR i IN 1..dim_end LOOP --from 1 to 100.
    tmp_sum = 0;

    FOR j IN (1+(i-1)*dim_step)..i*dim_step LOOP --from 1 to 3, 4 to 6, ...
      tmp_sum = tmp_sum + p_array300[j];  
    END LOOP; 

    result_array[i] = tmp_sum / dim_step;
  END LOOP; 

  RETURN result_array;
END;
$body$
LANGUAGE 'plpgsql'
IMMUTABLE
RETURNS NULL ON NULL INPUT;

It takes one array300 and outputs one array100 . To use it:

SELECT id, array300_to_100(array300)
FROM table1;

If you have any problems understanding it - just ask me.

Putting the pieces of Igor into another form:

 select id, array300, (
    select array_agg(z) from
    (
        select avg(x) from 
        (
            select x, ntile(array_length(array300,1)/3) over() from unnest(array300) x
        ) y 
        group by ntile
    ) z
) array100
from your_table

For a small example table like this

 id |       array300        
----+-----------------------
  1 | {110,25,53,110,25,53}
  2 | {56,75,59,110,25,53}
  3 | {65,93,82,110,25,53}
  4 | {75,70,80,110,25,53}

the result is:

 id |       array300        |                   array100                    
----+-----------------------+-----------------------------------------------
  1 | {110,25,53,110,25,53} | {(62.6666666666666667),(62.6666666666666667)}
  2 | {56,75,59,110,25,53}  | {(63.3333333333333333),(62.6666666666666667)}
  3 | {65,93,82,110,25,53}  | {(80.0000000000000000),(62.6666666666666667)}
  4 | {75,70,80,110,25,53}  | {(75.0000000000000000),(62.6666666666666667)}
(4 rows)

Edit My first version used a fixes ntile(2) . This only worked for source arrays of size 6. I've fixed that by using array_length(array300,1)/3 instead.

I'm not able to answer your question completely, however I have found aggregation function for summing integer arrays. Perhaps someone (or you) can modify it to avg.

Source: http://archives.postgresql.org/pgsql-sql/2005-04/msg00402.php

CREATE OR REPLACE FUNCTION array_add(int[],int[]) RETURNS int[] AS '
  DECLARE
    x ALIAS FOR $1;
    y ALIAS FOR $2;
    a int;
    b int;
    i int;
    res int[];
  BEGIN
    res = x;

    a := array_lower (y, 1);
    b := array_upper (y, 1);

    IF a IS NOT NULL THEN
      FOR i IN a .. b LOOP
        res[i] := coalesce(res[i],0) + y[i];
      END LOOP;
    END IF;

    RETURN res;
  END;
'
LANGUAGE plpgsql STRICT IMMUTABLE;

--- then this aggregate lets me sum integer arrays...

CREATE AGGREGATE sum_integer_array (
    sfunc = array_add,
    basetype = INTEGER[],
    stype = INTEGER[],
    initcond = '{}'
);


Here's how my sample table looked  and my new array summing aggregate
and function:

#SELECT * FROM arraytest ;
 id | somearr
----+---------
 a  | {1,2,3}
 b  | {0,1,2}
(2 rows)

#SELECT sum_integer_array(somearr) FROM arraytest ;
 sum_integer_array
-------------------
 {1,3,5}
(1 row)

Is this any faster?

Edit: This is more elegant:

with  t as (select generate_series(1, 100,1) a , generate_series(101,200,1) b , generate_series(201,300,1) c)

    select 
        id,
        array_agg((array300[a] + array300[b] + array300[c]) / 3::numeric order by a)  as avg
    from 
        t,
        tmp.test2
    group by 
        id

End of edit

Edit2 This is the shortest select I can think of:

select 
    id,
    array_agg((array300[a] + array300[a+100] + array300[a+200]) / 3::numeric order by a)  as avg
from 
    (select generate_series(1, 100,1) a) t,
    tmp.test2
group by 
    id

End of edit2

with 

t as (select generate_series(1, 100,1) a , generate_series(101,200,1) b , generate_series(201,300,1) c)

,u as (
    select 
        id,
        a,
        (array300[a] + array300[b] + array300[c]) / 3::numeric as avg
    from 
        t,
        tmp.test2 /* table with arrays - id, array300 */
    order by 
        id,
        a
 )

select 
    id, 
    array_agg(avg)
from 
    u 
group by 
    id

You can always try whether hardcoded variant works better:

select 

array[
(input_array[1] + input_array[101] + input_array[201])/3::numeric,
(input_array[2] + input_array[102] + input_array[202])/3::numeric,
(input_array[3] + input_array[103] + input_array[203])/3::numeric,
(input_array[4] + input_array[104] + input_array[204])/3::numeric,
(input_array[5] + input_array[105] + input_array[205])/3::numeric,
(input_array[6] + input_array[106] + input_array[206])/3::numeric,
(input_array[7] + input_array[107] + input_array[207])/3::numeric,
(input_array[8] + input_array[108] + input_array[208])/3::numeric,
(input_array[9] + input_array[109] + input_array[209])/3::numeric,
(input_array[10] + input_array[110] + input_array[210])/3::numeric,
(input_array[11] + input_array[111] + input_array[211])/3::numeric,
(input_array[12] + input_array[112] + input_array[212])/3::numeric,
(input_array[13] + input_array[113] + input_array[213])/3::numeric,
(input_array[14] + input_array[114] + input_array[214])/3::numeric,
(input_array[15] + input_array[115] + input_array[215])/3::numeric,
(input_array[16] + input_array[116] + input_array[216])/3::numeric,
(input_array[17] + input_array[117] + input_array[217])/3::numeric,
(input_array[18] + input_array[118] + input_array[218])/3::numeric,
(input_array[19] + input_array[119] + input_array[219])/3::numeric,
(input_array[20] + input_array[120] + input_array[220])/3::numeric,
(input_array[21] + input_array[121] + input_array[221])/3::numeric,
(input_array[22] + input_array[122] + input_array[222])/3::numeric,
(input_array[23] + input_array[123] + input_array[223])/3::numeric,
(input_array[24] + input_array[124] + input_array[224])/3::numeric,
(input_array[25] + input_array[125] + input_array[225])/3::numeric,
(input_array[26] + input_array[126] + input_array[226])/3::numeric,
(input_array[27] + input_array[127] + input_array[227])/3::numeric,
(input_array[28] + input_array[128] + input_array[228])/3::numeric,
(input_array[29] + input_array[129] + input_array[229])/3::numeric,
(input_array[30] + input_array[130] + input_array[230])/3::numeric,
(input_array[31] + input_array[131] + input_array[231])/3::numeric,
(input_array[32] + input_array[132] + input_array[232])/3::numeric,
(input_array[33] + input_array[133] + input_array[233])/3::numeric,
(input_array[34] + input_array[134] + input_array[234])/3::numeric,
(input_array[35] + input_array[135] + input_array[235])/3::numeric,
(input_array[36] + input_array[136] + input_array[236])/3::numeric,
(input_array[37] + input_array[137] + input_array[237])/3::numeric,
(input_array[38] + input_array[138] + input_array[238])/3::numeric,
(input_array[39] + input_array[139] + input_array[239])/3::numeric,
(input_array[40] + input_array[140] + input_array[240])/3::numeric,
(input_array[41] + input_array[141] + input_array[241])/3::numeric,
(input_array[42] + input_array[142] + input_array[242])/3::numeric,
(input_array[43] + input_array[143] + input_array[243])/3::numeric,
(input_array[44] + input_array[144] + input_array[244])/3::numeric,
(input_array[45] + input_array[145] + input_array[245])/3::numeric,
(input_array[46] + input_array[146] + input_array[246])/3::numeric,
(input_array[47] + input_array[147] + input_array[247])/3::numeric,
(input_array[48] + input_array[148] + input_array[248])/3::numeric,
(input_array[49] + input_array[149] + input_array[249])/3::numeric,
(input_array[50] + input_array[150] + input_array[250])/3::numeric,
(input_array[51] + input_array[151] + input_array[251])/3::numeric,
(input_array[52] + input_array[152] + input_array[252])/3::numeric,
(input_array[53] + input_array[153] + input_array[253])/3::numeric,
(input_array[54] + input_array[154] + input_array[254])/3::numeric,
(input_array[55] + input_array[155] + input_array[255])/3::numeric,
(input_array[56] + input_array[156] + input_array[256])/3::numeric,
(input_array[57] + input_array[157] + input_array[257])/3::numeric,
(input_array[58] + input_array[158] + input_array[258])/3::numeric,
(input_array[59] + input_array[159] + input_array[259])/3::numeric,
(input_array[60] + input_array[160] + input_array[260])/3::numeric,
(input_array[61] + input_array[161] + input_array[261])/3::numeric,
(input_array[62] + input_array[162] + input_array[262])/3::numeric,
(input_array[63] + input_array[163] + input_array[263])/3::numeric,
(input_array[64] + input_array[164] + input_array[264])/3::numeric,
(input_array[65] + input_array[165] + input_array[265])/3::numeric,
(input_array[66] + input_array[166] + input_array[266])/3::numeric,
(input_array[67] + input_array[167] + input_array[267])/3::numeric,
(input_array[68] + input_array[168] + input_array[268])/3::numeric,
(input_array[69] + input_array[169] + input_array[269])/3::numeric,
(input_array[70] + input_array[170] + input_array[270])/3::numeric,
(input_array[71] + input_array[171] + input_array[271])/3::numeric,
(input_array[72] + input_array[172] + input_array[272])/3::numeric,
(input_array[73] + input_array[173] + input_array[273])/3::numeric,
(input_array[74] + input_array[174] + input_array[274])/3::numeric,
(input_array[75] + input_array[175] + input_array[275])/3::numeric,
(input_array[76] + input_array[176] + input_array[276])/3::numeric,
(input_array[77] + input_array[177] + input_array[277])/3::numeric,
(input_array[78] + input_array[178] + input_array[278])/3::numeric,
(input_array[79] + input_array[179] + input_array[279])/3::numeric,
(input_array[80] + input_array[180] + input_array[280])/3::numeric,
(input_array[81] + input_array[181] + input_array[281])/3::numeric,
(input_array[82] + input_array[182] + input_array[282])/3::numeric,
(input_array[83] + input_array[183] + input_array[283])/3::numeric,
(input_array[84] + input_array[184] + input_array[284])/3::numeric,
(input_array[85] + input_array[185] + input_array[285])/3::numeric,
(input_array[86] + input_array[186] + input_array[286])/3::numeric,
(input_array[87] + input_array[187] + input_array[287])/3::numeric,
(input_array[88] + input_array[188] + input_array[288])/3::numeric,
(input_array[89] + input_array[189] + input_array[289])/3::numeric,
(input_array[90] + input_array[190] + input_array[290])/3::numeric,
(input_array[91] + input_array[191] + input_array[291])/3::numeric,
(input_array[92] + input_array[192] + input_array[292])/3::numeric,
(input_array[93] + input_array[193] + input_array[293])/3::numeric,
(input_array[94] + input_array[194] + input_array[294])/3::numeric,
(input_array[95] + input_array[195] + input_array[295])/3::numeric,
(input_array[96] + input_array[196] + input_array[296])/3::numeric,
(input_array[97] + input_array[197] + input_array[297])/3::numeric,
(input_array[98] + input_array[198] + input_array[298])/3::numeric,
(input_array[99] + input_array[199] + input_array[299])/3::numeric,
(input_array[100] + input_array[200] + input_array[300])/3::numeric
]

from tmp.test

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM