简体   繁体   中英

Mysql cumulative sum, with reset condition, group by period

I have a table with each row holding fields named id (key), date, sign, value.

Sign = 0 is a reset condition. It will be used to set initial cumulative value

Sign = +1 is an add condition. It will sum value to cumulative total.

Sign = -1 is a subtraction, and as you'll guess, it will decrease cumulative total.

Order is important, so they must be evalued with 0, 1, -1 ordering.

Let's say I'd want to obtain cumulative sum of the value.

SELECT my_date, my_sign, my_value, @cum AS cum_before,
(@cum := IF( my_sign !=0, @cum + my_sign * my_value, my_value ) ) AS cum_after
FROM my_table, (SELECT @cum :=0) as t WHERE my_date LIKE '2016-05-%'
ORDER BY my_date, my_sign + ( my_sign =0 ) *2 DESC;

Will correctly display:

my_date     my_sign  my_value cum_before    cum_after
2016-05-02        0  10000.00          0    10000.00
2016-05-02        1   1860.00      10000    11860.00
2016-05-02       -1   1860.00      11860    10000.00
2016-05-03        1   1780.00      10000    11780.00
2016-05-06        1   4625.00      11780    16405.00
2016-05-09        1  14200.00      16405    30605.00

Now I'd like to group it by week (or month), and having cum_before set to initial value before processing the rows in the group (which incidentally should be the cum_after of the previous group), and cum_after as the cumulative value after processing the rows in the group. Things gets complicated as previous state variable @cum seems to be initialized with the first value of the group set.

I'm creating a temporary table to keep the correct ordering, as GROUP seems not to obey any ORDER BY clause (i'm guessing it's taking rows as they appears in the DB).

CREATE TEMPORARY TABLE _t_ SELECT id FROM my_table
ORDER BY my_date, my_sign + ( my_sign =0 ) *2 DESC ;

I'm using assign in place (@cum:=value), inside the SUM function, to change @cum accordingly when grouping, and multiply it by 0 not to interfere with real sum, which will sum sign*value in normal case, and will subtract @cum and add value field, when reset condition is found.

SELECT min( my_date ) AS MinDate, max( my_date ) AS MaxDate,
@cum AS cum_before, SUM( 
  0 * ( @cum := IF( my_sign !=0, my_sign * my_value, my_value ) ) + 
  IF( my_sign !=0, my_sign * my_value, - @cum + my_value ) 
) AS cum_after
FROM my_table as F, _t_, (SELECT @cum :=0) AS t
WHERE _t_.id = F.id AND my_date LIKE '2016-05-%'
GROUP BY date_format( my_date, "%y%U" );

will give the following:

MinDate     MaxDate     cum_before cum_after
2016-05-02  2016-05-06       10000  16405.00
2016-05-09  2016-05-09       14200  14200.00

which is wrong, because what I expect to obtain is:

MinDate     MaxDate     cum_before cum_after
2016-05-02  2016-05-06           0  16405.00
2016-05-09  2016-05-09       16405  30605.00

Basically it appears to assign @cum the first row in the set, instead of keeping previous variable value.

How can I obtain the correct grouping, if it's possible?

Adapting a previous answer by Haleemur Ali it appears it cannot be done in a single step, as cumulative variable is reset after each group. So a possible solution is the following:

SELECT MinDate, MaxDate, @cum as cum_before, (@cum:=@cum+tmp_cum) as cum_after
FROM ( SELECT min( my_date ) AS MinDate, max( my_date ) AS MaxDate,
SUM( 
  0 * ( @tmp := IF( my_sign !=0, my_sign * my_value, my_value ) ) + 
  IF( my_sign !=0, my_sign * my_value, - @tmp + my_value ) 
) AS tmp_cum
FROM my_table as F, _t_, (SELECT @tmp :=0) AS t
WHERE _t_.id = F.id AND my_date LIKE '2016-05-%'
GROUP BY date_format( my_date, "%y%U" ) ) as SUBQ, (SELECT @cum:=0) as tmp_var;

Don't know about performance, or if this can be optimized.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM