简体   繁体   English

如何在 hive 中生成 MTD(月至今)滚动总和?

[英]How to generate MTD (Month to Date) rolling sum in hive?

I expected someone might have asked this before, but somehow I couldn't found anything.我预计以前可能有人问过这个问题,但不知何故我找不到任何东西。 Please let me know if this duplicated.请让我知道这是否重复。

So let's say I have a table in below format所以假设我有一个格式如下的表格

| event_date | v |
|------------+---|
| 2021-01-01 | 1 |
| 2021-01-02 | 1 |
| .......... | . |
| 2021-01-31 | 1 |
| 2021-02-01 | 1 |
| 2021-02-02 | 1 |

I would like to calculate the rolling sum of within each month.我想计算每个月内的滚动总和。 So the output would be like this (assume v=1 for all row):所以 output 将是这样的(假设所有行的v=1 ):

| event_date | v | cum_v |
|------------+---+-------|
| 2021-01-01 | 1 |     1 |
| 2021-01-02 | 1 |     2 |
| .......... | . |     . |
| 2021-01-31 | 1 |    31 |
| 2021-02-01 | 1 |     1 |
| 2021-02-02 | 1 |     2 |

This would be similar to calculating the cum sum of rolling 30 days, but instead it's rolling 1 month.这类似于计算滚动 30 天的总和,但它是滚动 1 个月。

I think this can be done using hive's window function and interval command, but wasn't able to find any useful document on interval command.我认为这可以使用 hive 的 window function 和interval命令来完成,但无法在interval命令上找到任何有用的文档。 I'm also looking forward to do QTD and YTD rollup, so would hope to do this in a flexible manner.我也期待进行 QTD 和 YTD 汇总,因此希望以灵活的方式进行。

Use analytic function with partition by substr(event_date, 1,7) order by event_date:使用分析 function 按 substr(event_date, 1,7) order by event_date 分区:

select sum(v) over (partition by substr(event_date, 1,7) order by event_date) as rolling_sum

You want a cumulative sum which looks like this:您想要一个如下所示的累积总和:

select t.*,
       sum(val) over (partition by year(event_date), month(event_date)
                      order by event_date
                     ) as mtd
from t;

This easily generalized to YTD and QTD.这很容易推广到 YTD 和 QTD。

Or if you prefer a somewhat shorter form:或者,如果您喜欢更短的形式:

       sum(val) over (partition by last_day(event_date)
                      order by event_date
                     ) as mtd

I strongly recommend that you use date functions on date columns.我强烈建议您在日期列上使用日期函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM