Hive 查询：如何使用分组依据排名？

Question

I have a table like below我有一张如下表

year                            int                                                                                                                   
month                           int                                                                                                                   
symbol                          string                                                                                                                
company_name                    string                                                                                                                
sector                          string                                                                                                                
sub_industry                    string                                                                                                                
state                           string                                                                                                                
avg_open                        double                                                                                                                
avg_close                       double                                                                                                                
avg_low                         double                                                                                                                
avg_high                        double                                                                                                                
avg_volume                      double

The field starting with avg_ refers to the average value in a month for a year.以avg_开头的字段是指一年中一个月的平均值。 I need to find for each sector the year in which average of avg_close is the lowest.我需要为每个部门找到avg_close平均值最低的avg_close 。

I tried to do something like below我试着做类似下面的事情

SELECT sector, year FROM
  (
    SELECT sector, year, RANK() OVER (ORDER BY s2.yearly_avg_close) AS RANK FROM
      ( SELECT year,sector, AVG(avg_close) AS yearly_avg_close FROM stock_summary GROUP BY sector, year) s2
  ) s1 
WHERE
  s1.RANK = 1;

But this is printing just one sector and year like below但这只是打印一个部门和年份，如下所示

Telecommunications Services     2010

I am new to hive and playing around with some toy schemas.我是 hive 的新手，正在玩一些玩具模式。 Can someone let me know what should be the correct way of solving this?有人可以让我知道解决这个问题的正确方法是什么吗？

Hive Version - 1.1.0 Hive 版本 - 1.1.0

Answer 1

Include sector into the partition by in the rank() function:在rank()函数rank() sector包含到partition by中：

SELECT sector, year, RANK() OVER (partition by sector ORDER BY s2.yearly_avg_close) AS RANK

Add year as well if you need rank per each sector and year添加year ，以及如果你需要每各职级sector和year

Read also this explanation how rank works: https://stackoverflow.com/a/55909947/2700344另请阅读此解释排名如何工作： https : //stackoverflow.com/a/55909947/2700344

Hive 查询：如何使用分组依据排名？

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-03-13 04:50:36

Hive 查询：如何使用分组依据排名？

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-03-13 04:50:36

解决方案1
0 已采纳 2020-03-13 04:50:36