I have a pandas dataframe which looks like:
id name grade
1 A 10
1 A 12
1 A 15
2 B 20
3 C 19
3 C 16
3 C 11
And need to make it look like:
id name grade
1 A 12
1 A 15
2 B 20
3 C 19
3 C 16
In this case I need to keep top 2 rows for each id with highest grades. I know I could use iloc
and iterate through the dataframe but am wondering if there is a more pythonic way of doing this. Is this possible at all? Thanks in advance
Btw, feel free to edit the question and give it a better title if you have anything in mind.
UPDATE1 I have accepted @willem-van-onsem 's answer since it was posted first and works fine for me. The other answer works good as well. I am not sure about each answer's performance so if for any reason you think the other one might be more suitable please leave a comment here so I update the answer and the post as well for others.
UPDATE2 The accepted answer works way better on large dataframes and that's why I am going to stick to it as the answer.
Use nlargest
df.loc[df.groupby('id').grade.nlargest(2).index.get_level_values(1)].sort_index()
id name grade
1 1 A 12
2 1 A 15
3 2 B 20
4 3 C 19
5 3 C 16
We can first sort the rows on name
(ascending), and grade
(descending) (sorting by name is not strictly required), then we groupby
name
, and then we get the first two rows with ( head
):
df.sort_values(['name', 'grade'], ascending=[True, False]).groupby('name').head(2)
This will produce:
>>> df.sort_values(['name', 'grade'], ascending=[True, False]).groupby('name').head(2)
grade id name
2 15 1 A
1 12 1 A
3 20 2 B
4 19 3 C
5 16 3 C
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.