在蜂巢中分组后分区

Question

假设有一个包含一些数据的表和一个包含日期的列：

column1, column2, date
a, a, 2016
a, b, 2016
a, c, 2017
b, d, 2017
b, e, 2017

情况是计算每个column1的column2出现次数，并为每个column1应用最小日期。

第一部分是一个简单的分组依据。 第二个可以通过partition by子句获得。 但是，如何将这两种方法巧妙而干净地结合在一起？ 分区是否真的需要获取最小日期？ 任何明智的建议将是巨大的！

预期产量：

column1, count, min_date
a, 3, 2016
b, 2, 2017

Answer 1

简单group by ：

select column1, 
       count(distinct column2) count, --remove distinct if you need count not null column2 by column1
                                      --use count(*) if you need count all rows by column1
       min(date)               min_date
from table
group by column1

让我们测试一下：

select column1, 
       count(distinct column2) count, --remove distinct if you need count not null column2 by column1
                                      --use count(*) if you need count all rows by column1
       min(date)               min_date
from (
select 
stack(6,
'a','a', 2016, 
'a','b', 2016, 
'a','c', 2017, 
'b','d', 2017, 
'b','e', 2017, 
'c','e', 2015) as( column1, column2, date)
) s
group by column1

结果：

a   3   2016    
b   2   2017    
c   1   2015

请注意，min_date为每个column1值选择了最小值。

在蜂巢中分组后分区

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-10-05 11:50:50

在蜂巢中分组后分区

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-10-05 11:50:50

解决方案1
0 已采纳 2017-10-05 11:50:50