[英]Group by fields in HIVE to get all columns using Hive
我有以下数据集,我想基于特定列上的分组依据(或某些其他功能)获得特定列值。 我的数据集如下所示:
id zip Action content duration OS TIME
================================================
1 11 START DELL LINUX 12
1 11 JUMP HP UNIX 14
1 11 STOP HP 10 LINUX 16
1 11 START WIN LINUX 2
1 11 JUMP HP UNIX 4
1 11 STOP SONY 12 LINUX 15
2 12 START HP UNIX 3
2 12 STOP FOP 2 WINDOWS 10
--------------------------------------------
我想获得基于相同(id,zip)组的所有列值,其中Action ='STOP'和已过滤记录的最大时间。 我的预期输出将是:
id zip Action content duration OS
========================================
1 11 STOP HP 10 LINUX
2 12 STOP FOP 2 WINDOWS
--------------------------------------------
我如何使用HIVE达到同样的目的? 请帮忙。
row_number
select id,zip,Action,content,duration,OS
from (select *
,row_number() over
(
partition by id,zip
order by time desc
) as rn
from mytable
where action = 'STOP'
) t
where rn = 1
+----+-----+--------+---------+----------+---------+
| id | zip | action | content | duration | os |
+----+-----+--------+---------+----------+---------+
| 1 | 11 | STOP | HP | 10 | LINUX |
| 2 | 12 | STOP | FOP | 2 | WINDOWS |
+----+-----+--------+---------+----------+---------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.