简体   繁体   中英

want to make query using running sum for postgres

I have a problem for creating a query for postgres(strictly speaking its redshift).
table data is below.
the table is PARTITION BY user_id ORDER BY created_at desc

data

user_id| x | y |  min |     created_at      
-------+---+---+------+---------------------
      1| 1 | 1 |    1 | 2015-01-15 17:26:53
      1| 1 | 1 |    2 | 2015-01-15 17:26:54
      1| 1 | 1 |    3 | 2015-01-15 17:26:55
      1| 2 | 1 |   10 | 2015-01-16 02:46:21
      1| 1 | 1 |   15 | 2015-01-16 02:46:22
      1| 3 | 3 |   11 | 2015-01-16 03:01:44
      1| 3 | 3 |    2 | 2015-01-16 03:02:06
      2| 1 | 1 |    3 | 2015-01-16 03:02:12
      2| 2 | 1 |    4 | 2015-01-16 03:02:15
      2| 2 | 1 |    7 | 2015-01-16 03:02:18

and what I want is below

ideal result

user_id| x | y |  sum_min |
-------+---+---+----------+
      1| 1 | 1 |        6 |
      1| 2 | 1 |       10 |
      1| 1 | 1 |       15 |
      1| 3 | 3 |       13 |
      2| 1 | 1 |        3 |
      2| 2 | 1 |       11 |

If I use simply group by user_id, x, y, the result of will be

 user_id| x | y |  sum_min |
 -------+---+---+----------+
       1| 1 | 1 |       21 |
       :| : | : |        : |

this is not good for me:(

try this

with cte as (
select user_id,x,y,created_at,sum(min) over (partition by user_id,x,y,replace order by user_id )  sum_min  from (
select user_id,x,y,min,replace( created_at::date::text ,'-',''),created_at   from usr order by created_at
)t   order by created_at
)

select user_id,x,y,sum_min from cte 
group by sum_min,user_id,x,y
order by user_id

Maybe try grouping it by the creation date as well:

select user_id, x, y, sum(min), created_at::date from test
group by user_id, x, y, created_at::date
order by user_id, x, y, created_at

It seems that what you want to do is to calculate an aggregate function over a cluster of records ordered on a column that is based on same values in three columns, separated from other clusters only by those three column values. That is not possible in standard SQL because the order of records is not relevant to any of the SQL commands. The fact that you order by date does not change that: SQL commands simply do not support this kind of stratification.

The only option that I am aware of is to create a plpgsql function with a cursor on your data relation (presumably a view, but would work equally well with a table). You iterate over all the records in the relation and for each cluster encountered sum up the min values and output a new record with the clustering columns and the summed value.

CREATE FUNCTION sum_clusters()
RETURNS TABLE (user_id int, x int, y int, sum_int int) AS $$
DECLARE
  data_row data%ROWTYPE;
  cur CURSOR FOR SELECT * FROM data;
  cur_user integer;
  cur_x integer;
  cur_y integer;
  sum integer;
BEGIN
  OPEN cur;
  FETCH NEXT cur INTO data_row;
  LOOP
    IF NOT FOUND THEN
      EXIT;
    END IF;
    cur_user := data_row.user_id;
    cur_x := data_row.x;
    cur_y := data_row.y;
    sum := data_row.min;
    LOOP
      FETCH NEXT cur INTO data_row;
      IF NOT FOUND THEN
        EXIT;
      END IF;
      IF (data_row.user_id = cur_user) AND (data_row.x = cur_x) AND (data_row.y = cur_y) THEN
        sum += data_row.min;
      ELSE
        EXIT;
      END IF;
    END LOOP;
    RETURN NEXT cur_user, cur_x, cur_y, sum;
  END LOOP;
  RETURN;
END;
$$ LANGUAGE plpgsql;

That is a lot of code and not particularly fast, but it should work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM