简体   繁体   中英

How to generate MTD (Month to Date) rolling sum in hive?

I expected someone might have asked this before, but somehow I couldn't found anything. Please let me know if this duplicated.

So let's say I have a table in below format

| event_date | v |
|------------+---|
| 2021-01-01 | 1 |
| 2021-01-02 | 1 |
| .......... | . |
| 2021-01-31 | 1 |
| 2021-02-01 | 1 |
| 2021-02-02 | 1 |

I would like to calculate the rolling sum of within each month. So the output would be like this (assume v=1 for all row):

| event_date | v | cum_v |
|------------+---+-------|
| 2021-01-01 | 1 |     1 |
| 2021-01-02 | 1 |     2 |
| .......... | . |     . |
| 2021-01-31 | 1 |    31 |
| 2021-02-01 | 1 |     1 |
| 2021-02-02 | 1 |     2 |

This would be similar to calculating the cum sum of rolling 30 days, but instead it's rolling 1 month.

I think this can be done using hive's window function and interval command, but wasn't able to find any useful document on interval command. I'm also looking forward to do QTD and YTD rollup, so would hope to do this in a flexible manner.

Use analytic function with partition by substr(event_date, 1,7) order by event_date:

select sum(v) over (partition by substr(event_date, 1,7) order by event_date) as rolling_sum

You want a cumulative sum which looks like this:

select t.*,
       sum(val) over (partition by year(event_date), month(event_date)
                      order by event_date
                     ) as mtd
from t;

This easily generalized to YTD and QTD.

Or if you prefer a somewhat shorter form:

       sum(val) over (partition by last_day(event_date)
                      order by event_date
                     ) as mtd

I strongly recommend that you use date functions on date columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM