简体   繁体   中英

window function rolling sum

I have a window function that gives me a rolling sum as below:

    SELECT start_terminal,
       duration_seconds,
       start_time,
       sum(duration_seconds) OVER
         (PARTITION BY start_terminal order by start_time)
         AS running_total
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'

but when some timestamps in the start_time column are equal the rolling sum stays the same, which makes sense bc I am ordering by start time, but it does not properly add the duration_seconds in the rolling sum as below: how can i fix or account for this?

current output:
     start_terminal duration_seconds start_time         running_total
        31000          74              2012-01-01 15:32:00     74
        31000          291             2012-01-02 12:40:00     365
        31000          520             2012-01-02 19:15:00     885
        31000          424             2012-01-03 07:22:00     1756
        31000          447             2012-01-03 07:22:00     1756
        31000         1422             2012-01-03 12:32:00     3178
         31000         348             2012-01-04 17:36:00     3526

desired:

start_terminal duration_seconds start_time         running_total
            31000          74              2012-01-01 15:32:00     74
            31000          291             2012-01-02 12:40:00     365
            31000          520             2012-01-02 19:15:00     885
            31000          424             2012-01-03 07:22:00     1756
            31000          447             2012-01-03 07:22:00     2203
            31000         1422             2012-01-03 12:32:00     3625
             31000         348             2012-01-04 17:36:00     3973

If you add the duration_seconds column to the order by in the partition , that should give you what you're looking for.

    SELECT start_terminal,
       duration_seconds,
       start_time,
       sum(duration_seconds) OVER
         (PARTITION BY start_terminal order by start_time, duration_seconds)
         AS running_total
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'

It is not clear how you want to resolve the tie. But the default window frame for order by is range between . You seem to want rows between :

SELECT start_terminal, duration_seconds, start_time,
       sum(duration_seconds) OVER (
           partition by start_terminal
           order by start_time
           rows between unbounded preceding and current row
          ) as running_total
FROM tutorial.dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08';

If you want a secondary key on duration_seconds , you can add that to the order by . However, you'll have the same issue if two rows have the same values for both columns. If you had an id column or created at, then that could be used as a tie-breaker.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM