Get top n values per category in pandas retaining all columns

Question

After some transformations I got the following dataframe , how do I proceed to obtain the top n records by a column in this case short_name and using other as indicator frequency . I read this post but the problem with both solutions is that they get rid of the column product_name , they just retain the grouped column and I need to keep them all.

short_name          product_id    frequency
Yoghurt y cereales  975009684     32
Yoghurt y cereales  975009685     21
Yoghurt y cereales  975009700     16
Yoghurt y Cereales  21097         16
Yoghurt Bebible     21329         68
Yoghurt Bebible     21328         67
Yoghurt Bebible     21500         31

Answer 1

You could first sort the dataframe, then use groupby :

df.sort_values('frequency', ascending=False).groupby('short_name').head(2)
Out[28]: 
           short_name  product_id  frequency
4     Yoghurt Bebible       21329         68
5     Yoghurt Bebible       21328         67
0  Yoghurt y cereales   975009684         32
1  Yoghurt y cereales   975009685         21
3  Yoghurt y Cereales       21097         16

Answer 2

I'd try to use nlargest method:

In [5]: df.groupby('short_name', as_index=False).apply(lambda x: x.nlargest(2, 'frequency'))
Out[5]:
             short_name  product_id  frequency
0 4     Yoghurt Bebible       21329         68
  5     Yoghurt Bebible       21328         67
1 3  Yoghurt y Cereales       21097         16
2 0  Yoghurt y cereales   975009684         32
  1  Yoghurt y cereales   975009685         21

Answer 3

You can try this:

df.groupby('short_name', as_index=False).apply(lambda x: x.sort_values(by='frequency',ascending=False).head(2)).reset_index(drop=True)

Output:

           short_name  product_id  frequency
0     Yoghurt Bebible       21329         68
1     Yoghurt Bebible       21328         67
2  Yoghurt y Cereales       21097         16
3  Yoghurt y cereales   975009684         32
4  Yoghurt y cereales   975009685         21

Answer 4

If I understand your question correctly using .apply is not necessary in this case. This is something I would avoid unless there's no other option because of performance issues.

Try this:

df.sort_values(by='frequency',ascending=False).groupby('short_name').head(2).reset_index()

Answer 5

df.groupby('short_name').nlargest(2, 'frequency').reset_index()

Get top n values per category in pandas retaining all columns

Question

5 answers

solution1
5 ACCPTED 2017-06-23 19:01:01

solution2
5 2017-06-23 19:42:07

solution3
3 2017-06-23 18:56:54

solution4
3 2017-06-23 19:20:23

solution5
0 2017-12-24 21:22:10

Get top n values per category in pandas retaining all columns

Question

5 answers

solution1 5 ACCPTED 2017-06-23 19:01:01

solution2 5 2017-06-23 19:42:07

solution3 3 2017-06-23 18:56:54

solution4 3 2017-06-23 19:20:23

solution5 0 2017-12-24 21:22:10

solution1
5 ACCPTED 2017-06-23 19:01:01

solution2
5 2017-06-23 19:42:07

solution3
3 2017-06-23 18:56:54

solution4
3 2017-06-23 19:20:23

solution5
0 2017-12-24 21:22:10