简体   繁体   中英

Sum across partitions with window functions

I have the following problem...

    Time |  A   |  B |  C  --  Sum should be
      1     a1     b1   c1     a1 + b1 + c1
      2     a2     b2   x      a2 + b1 + c1 
      3     a3     x    x      a3 + b2 + c1
      4     x      b3   c2     a3 + b3 + c2

Essentially, the sum needs to be across the most recent value in time for each of the three rows. Each data column doesn't necessarily have a value for the current time.

I have tried several approaches using window functions and have been unsuccessful. I have written a stored procedure that does what I need, but it is SLOW.

CREATE OR REPLACE FUNCTION timeseries.combine_series(id int[], startTime timestamp, endTime timestamp) 
RETURNS setof RECORD AS $$
DECLARE
    retval double precision = 0;
    row_data timeseries.total_active_energy%ROWTYPE;
    maxCount integer = 0;
    sz integer = 0;
lastVal double precision[];
v_rec RECORD;
BEGIN   
    SELECT INTO sz array_length($1,1);

    FOR row_data IN SELECT * FROM timeseries.total_active_energy  WHERE time >= startTime AND time < endTime AND device_id = ANY($1) ORDER BY time
       LOOP
    retval = row_data.active_power;
    for i IN 1..sz LOOP
        IF $1[i]=row_data.device_id THEN
            lastVal[i] = row_data.active_power;
        ELSE
            retval = retVal + COALESCE(lastVal[i],0);
        END IF;
    END LOOP;

    SELECT row_data.time, retval into v_rec;

    return next v_rec;
     END LOOP;

      return ;
  END;
$$ LANGUAGE plpgsql;

Call:

select * from timeseries.combine_series('{552,553,554}'::int[], '2013-05-01'::timestamp, '2013-05-02'::timestamp) 
    AS (t timestamp with time zone, val double precision);
SELECT ts, a, b, c
       , COALESCE(max(a) OVER (PARTITION BY grp_a), 0)
       + COALESCE(max(b) OVER (PARTITION BY grp_b), 0)
       + COALESCE(max(c) OVER (PARTITION BY grp_c), 0) AS special_sum
FROM  (
   SELECT *
         ,count(a) OVER w AS grp_a
         ,count(b) OVER w AS grp_b
         ,count(c) OVER w AS grp_c
   FROM   t
   WINDOW w AS (ORDER BY ts)
   ) sub
ORDER  BY ts;

First, put actual values and following NULL values in a group with the aggregate window function count() : it does not increment with NULL values.

Then take max() from every group, arriving at what you are looking for. At this point you could just as well use min() or sum() , since there is only one non-null value per group.

COALESCE() catches NULL values if the overall first value in time is NULL .

Note how I picked ts as column name, since I don't use base type names like time as identifiers.

Test case

That's also how you everyone should provide sample data in the first place!

CREATE TEMP TABLE t (ts int, a int, b int, c int);

INSERT INTO t VALUES
  (1, 11,   21,   NULL)
 ,(2, 12,   22,   NULL)
 ,(3, 13,   NULL, NULL)
 ,(4, NULL, 23,   32);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM