[英]Hive - Issue with the hive sub query
My problem statement is like 我的问题陈述就像
"Find top 2 districts per state with the highest population" “找到每个州人口最多的前2个地区”
data is like 数据就像
My expected output is 我的预期输出是
I tried this with lot of queries and sub-queries but results in SQL error with the sub query 我尝试了很多查询和子查询,但是子查询导致SQL错误
Can anyone help me with getting this result? 谁能帮我得到这个结果?
Thanks in advance. 提前致谢。
Queries I tried 我尝试过的查询
from population group by state_name 来自按state_name分类的人群
Below would be the query - 以下是查询-
select A.state, collect_set(A.dist)[0], collect_set(A.dist)[1] from
(select state, dist, row_number() over (partition by state order by population
desc) as rnk from <tableName>) A
where A.rnk<=2 group by A.state;
Below are the results for sample data - 以下是示例数据的结果-
hive> select * from hier;
OK
C1 C11
C11 C12
C12 123
P1 C1
P2 C2
hive> select parent, collect_set(child)[0], collect_set(child)[1] from hier group by parent;
OK
C1 C11 NULL
C11 C12 NULL
C12 123 NULL
P1 C1 NULL
P2 C2 NULL
Time taken: 19.212 seconds, Fetched: 5 row(s)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.