[英]Cummulative sum by year_month, location, state in Hive SQL
我想根据其他列在recorrencia
列中使用 Hive SQL 进行累积计数。
+------------+---------+-------+--------------+--+
| t.ano_mes | t.site | t.uf | recorrencia |
+------------+---------+-------+--------------+--+
| 202001 | 174 | AM | 1 |
| 202002 | 174 | AM | 1 |
| 202003 | 174 | AM | 1 |
| 202004 | 174 | AM | 1 |
| 202005 | 174 | AM | 1 |
| 202006 | 174 | AM | 1 |
| 202007 | 174 | AM | 1 |
| 202008 | 174 | AM | 1 |
| 202005 | 1JN | SP | 1 |
| 202006 | 1JN | SP | 1 |
| 202005 | 1LJ | SP | 1 |
| 202009 | 1LJ | SP | 1 |
| 202001 | 1RG | SP | 1 |
| 202002 | 1RG | SP | 1 |
| 202003 | 1RG | SP | 1 |
| 202004 | 1RG | SP | 1 |
| 202005 | 1RG | SP | 1 |
| 202006 | 1RG | SP | 1 |
| 202007 | 1RG | SP | 1 |
期望输出
+------------+---------+-------+--------------+--------+
| t.ano_mes | t.site | t.uf | recorrencia |cum_rec
+------------+---------+-------+--------------+--------+
| 202001 | 174 | AM | 1 |1
| 202002 | 174 | AM | 1 |2
| 202003 | 174 | AM | 1 |3
| 202004 | 174 | AM | 1 |4
| 202005 | 174 | AM | 1 |5
| 202006 | 174 | AM | 1 |6
| 202007 | 174 | AM | 1 |7
| 202008 | 174 | AM | 1 |8
| 202005 | 1JN | SP | 1 |1
| 202006 | 1JN | SP | 1 |2
| 202005 | 1LJ | SP | 1 |1
| 202009 | 1LJ | SP | 1 |2
| 202001 | 1RG | SP | 1 |1
| 202002 | 1RG | SP | 1 |2
| 202003 | 1RG | SP | 1 |3
| 202004 | 1RG | SP | 1 |4
| 202005 | 1RG | SP | 1 |5
| 202006 | 1RG | SP | 1 |6
| 202007 | 1RG | SP | 1 |7
我已经尝试了很多函数,如COUNT(*) OVER (t.ano_mes)
和COUNT(*) OVER (t.site)
但它运行总和直到表结束,并且不作为t.site
重新启动变化。
一旦t.site
更改,计数器应重新启动。
那将是:
sum(recorrencia) over(partition by t.site order by t.ano_mes) as cum_rec
partition by
子句会在每次站点更改时重置总和。
请注意,如果recorrencia
始终为1
,如您的示例数据所示,则row_number()
就足够了:
row_number() over(partition by t.site order by t.ano_mes) as cum_rec
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.