Python: filter pandas dataframe to keep specified number of rows based on a column

Question

I have a pandas dataframe which looks like:

id    name    grade
1     A       10
1     A       12
1     A       15
2     B       20
3     C       19
3     C       16
3     C       11

And need to make it look like:

id    name    grade
1     A       12
1     A       15
2     B       20
3     C       19
3     C       16

In this case I need to keep top 2 rows for each id with highest grades. I know I could use iloc and iterate through the dataframe but am wondering if there is a more pythonic way of doing this. Is this possible at all? Thanks in advance

Btw, feel free to edit the question and give it a better title if you have anything in mind.

UPDATE1 I have accepted @willem-van-onsem 's answer since it was posted first and works fine for me. The other answer works good as well. I am not sure about each answer's performance so if for any reason you think the other one might be more suitable please leave a comment here so I update the answer and the post as well for others.

UPDATE2 The accepted answer works way better on large dataframes and that's why I am going to stick to it as the answer.

Answer 1

Use nlargest

df.loc[df.groupby('id').grade.nlargest(2).index.get_level_values(1)].sort_index()

    id  name    grade
1   1   A       12
2   1   A       15
3   2   B       20
4   3   C       19
5   3   C       16

Answer 2

We can first sort the rows on name (ascending), and grade (descending) (sorting by name is not strictly required), then we groupby name , and then we get the first two rows with ( head ):

df.sort_values(['name', 'grade'], ascending=[True, False]).groupby('name').head(2)

This will produce:

>>> df.sort_values(['name', 'grade'], ascending=[True, False]).groupby('name').head(2)
   grade  id name
2     15   1    A
1     12   1    A
3     20   2    B
4     19   3    C
5     16   3    C

Python: filter pandas dataframe to keep specified number of rows based on a column

Question

2 answers

solution1
3 2018-02-01 20:18:48

solution2
2 ACCPTED 2018-02-01 20:13:37

Python: filter pandas dataframe to keep specified number of rows based on a column

Question

2 answers

solution1 3 2018-02-01 20:18:48

solution2 2 ACCPTED 2018-02-01 20:13:37

solution1
3 2018-02-01 20:18:48

solution2
2 ACCPTED 2018-02-01 20:13:37