在蜂巢中分組后分區

Question

假設有一個包含一些數據的表和一個包含日期的列：

column1, column2, date
a, a, 2016
a, b, 2016
a, c, 2017
b, d, 2017
b, e, 2017

情況是計算每個column1的column2出現次數，並為每個column1應用最小日期。

第一部分是一個簡單的分組依據。 第二個可以通過partition by子句獲得。 但是，如何將這兩種方法巧妙而干凈地結合在一起？ 分區是否真的需要獲取最小日期？ 任何明智的建議將是巨大的！

預期產量：

column1, count, min_date
a, 3, 2016
b, 2, 2017

Answer 1

簡單group by ：

select column1, 
       count(distinct column2) count, --remove distinct if you need count not null column2 by column1
                                      --use count(*) if you need count all rows by column1
       min(date)               min_date
from table
group by column1

讓我們測試一下：

select column1, 
       count(distinct column2) count, --remove distinct if you need count not null column2 by column1
                                      --use count(*) if you need count all rows by column1
       min(date)               min_date
from (
select 
stack(6,
'a','a', 2016, 
'a','b', 2016, 
'a','c', 2017, 
'b','d', 2017, 
'b','e', 2017, 
'c','e', 2015) as( column1, column2, date)
) s
group by column1

結果：

a   3   2016    
b   2   2017    
c   1   2015

請注意，min_date為每個column1值選擇了最小值。

在蜂巢中分組后分區

問題描述

1 個解決方案

解決方案1
0 已采納 2017-10-05 11:50:50

在蜂巢中分組后分區

問題描述

1 個解決方案

解決方案1 0 已采納 2017-10-05 11:50:50

解決方案1
0 已采納 2017-10-05 11:50:50