[英]Hive Query: How to use group by with rank?
I have a table like below我有一张如下表
year int
month int
symbol string
company_name string
sector string
sub_industry string
state string
avg_open double
avg_close double
avg_low double
avg_high double
avg_volume double
The field starting with avg_
refers to the average value in a month for a year.以
avg_
开头的字段是指一年中一个月的平均值。 I need to find for each sector the year in which average of avg_close
is the lowest.我需要为每个部门找到
avg_close
平均值最低的avg_close
。
I tried to do something like below我试着做类似下面的事情
SELECT sector, year FROM
(
SELECT sector, year, RANK() OVER (ORDER BY s2.yearly_avg_close) AS RANK FROM
( SELECT year,sector, AVG(avg_close) AS yearly_avg_close FROM stock_summary GROUP BY sector, year) s2
) s1
WHERE
s1.RANK = 1;
But this is printing just one sector and year like below但这只是打印一个部门和年份,如下所示
Telecommunications Services 2010
I am new to hive and playing around with some toy schemas.我是 hive 的新手,正在玩一些玩具模式。 Can someone let me know what should be the correct way of solving this?
有人可以让我知道解决这个问题的正确方法是什么吗?
Hive Version - 1.1.0 Hive 版本 - 1.1.0
Include sector
into the partition by
in the rank()
function:在
rank()
函数rank()
sector
包含到partition by
中:
SELECT sector, year, RANK() OVER (partition by sector ORDER BY s2.yearly_avg_close) AS RANK
Add year
as well if you need rank per each sector
and year
添加
year
,以及如果你需要每各职级sector
和year
Read also this explanation how rank works: https://stackoverflow.com/a/55909947/2700344另请阅读此解释排名如何工作: https : //stackoverflow.com/a/55909947/2700344
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.