简体   繁体   中英

Max aggregate function in hive

My current table is as follows:

ID    EventID    Time               Count
1     ABC        1435205220000      5
1     ABC        1500205220000      3
2     DEF        1435205220000      4

Output:

ID    EventID    Time               Count
1     ABC        1435205220000      5
2     DEF        1435205220000      4

Currently, I Group by ID, EventID to get max (Count). However, I need Time in the output as well, but if I add Time to Group By column then I wont get the desired output.

Use row_number() :

select t.*
from (select t.*,
             row_number() over (partition by id order by count desc) as seqnum
      from table t
     ) t
where seqnum = 1;

EDIT:

I would suggest that you update the version of Hive. An alternative is something like this:

select t.*
from table t join
     (select id, max(count) as maxc
      from table t
      group by id
     ) tt
     on t.id = tt.id;

This will return duplicate rows if an id has two with the same max count. (You can get the same effect in the first query by using rank() instead of row_number() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM