Max aggregate function in hive

Question

My current table is as follows:

ID    EventID    Time               Count
1     ABC        1435205220000      5
1     ABC        1500205220000      3
2     DEF        1435205220000      4

Output:

ID    EventID    Time               Count
1     ABC        1435205220000      5
2     DEF        1435205220000      4

Currently, I Group by ID, EventID to get max (Count). However, I need Time in the output as well, but if I add Time to Group By column then I wont get the desired output.

Answer 1

Use row_number() :

select t.*
from (select t.*,
             row_number() over (partition by id order by count desc) as seqnum
      from table t
     ) t
where seqnum = 1;

EDIT:

I would suggest that you update the version of Hive. An alternative is something like this:

select t.*
from table t join
     (select id, max(count) as maxc
      from table t
      group by id
     ) tt
     on t.id = tt.id;

This will return duplicate rows if an id has two with the same max count. (You can get the same effect in the first query by using rank() instead of row_number() .

Max aggregate function in hive

Question

1 answers

solution1
1 2015-07-27 02:08:56

Max aggregate function in hive

Question

1 answers

solution1 1 2015-07-27 02:08:56

solution1
1 2015-07-27 02:08:56