[英]Cummulative sum by year_month, location, state in Hive SQL
我想根據其他列在recorrencia
列中使用 Hive SQL 進行累積計數。
+------------+---------+-------+--------------+--+
| t.ano_mes | t.site | t.uf | recorrencia |
+------------+---------+-------+--------------+--+
| 202001 | 174 | AM | 1 |
| 202002 | 174 | AM | 1 |
| 202003 | 174 | AM | 1 |
| 202004 | 174 | AM | 1 |
| 202005 | 174 | AM | 1 |
| 202006 | 174 | AM | 1 |
| 202007 | 174 | AM | 1 |
| 202008 | 174 | AM | 1 |
| 202005 | 1JN | SP | 1 |
| 202006 | 1JN | SP | 1 |
| 202005 | 1LJ | SP | 1 |
| 202009 | 1LJ | SP | 1 |
| 202001 | 1RG | SP | 1 |
| 202002 | 1RG | SP | 1 |
| 202003 | 1RG | SP | 1 |
| 202004 | 1RG | SP | 1 |
| 202005 | 1RG | SP | 1 |
| 202006 | 1RG | SP | 1 |
| 202007 | 1RG | SP | 1 |
期望輸出
+------------+---------+-------+--------------+--------+
| t.ano_mes | t.site | t.uf | recorrencia |cum_rec
+------------+---------+-------+--------------+--------+
| 202001 | 174 | AM | 1 |1
| 202002 | 174 | AM | 1 |2
| 202003 | 174 | AM | 1 |3
| 202004 | 174 | AM | 1 |4
| 202005 | 174 | AM | 1 |5
| 202006 | 174 | AM | 1 |6
| 202007 | 174 | AM | 1 |7
| 202008 | 174 | AM | 1 |8
| 202005 | 1JN | SP | 1 |1
| 202006 | 1JN | SP | 1 |2
| 202005 | 1LJ | SP | 1 |1
| 202009 | 1LJ | SP | 1 |2
| 202001 | 1RG | SP | 1 |1
| 202002 | 1RG | SP | 1 |2
| 202003 | 1RG | SP | 1 |3
| 202004 | 1RG | SP | 1 |4
| 202005 | 1RG | SP | 1 |5
| 202006 | 1RG | SP | 1 |6
| 202007 | 1RG | SP | 1 |7
我已經嘗試了很多函數,如COUNT(*) OVER (t.ano_mes)
和COUNT(*) OVER (t.site)
但它運行總和直到表結束,並且不作為t.site
重新啟動變化。
一旦t.site
更改,計數器應重新啟動。
那將是:
sum(recorrencia) over(partition by t.site order by t.ano_mes) as cum_rec
partition by
子句會在每次站點更改時重置總和。
請注意,如果recorrencia
始終為1
,如您的示例數據所示,則row_number()
就足夠了:
row_number() over(partition by t.site order by t.ano_mes) as cum_rec
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.