[英]Python: filter pandas dataframe to keep specified number of rows based on a column
I have a pandas dataframe which looks like: 我有一个pandas数据框,看起来像:
id name grade
1 A 10
1 A 12
1 A 15
2 B 20
3 C 19
3 C 16
3 C 11
And need to make it look like: 并且需要使它看起来像:
id name grade
1 A 12
1 A 15
2 B 20
3 C 19
3 C 16
In this case I need to keep top 2 rows for each id with highest grades. 在这种情况下,我需要为每个id保持前2行,并且成绩最高。 I know I could use
iloc
and iterate through the dataframe but am wondering if there is a more pythonic way of doing this. 我知道我可以使用
iloc
并遍历数据帧,但我想知道是否有更多的pythonic方式来做到这一点。 Is this possible at all? 这有可能吗? Thanks in advance
提前致谢
Btw, feel free to edit the question and give it a better title if you have anything in mind. 顺便说一句,如果您有任何想法,请随时编辑问题并给它一个更好的标题。
UPDATE1 I have accepted @willem-van-onsem 's answer since it was posted first and works fine for me. UPDATE1我已经接受@ willem-van-onsem的答案,因为它首先发布并且对我来说很好。 The other answer works good as well.
另一个答案也很好。 I am not sure about each answer's performance so if for any reason you think the other one might be more suitable please leave a comment here so I update the answer and the post as well for others.
我不确定每个答案的表现,所以如果出于任何原因你认为另一个可能更合适,请在这里发表评论,以便我更新答案和帖子以及其他人。
UPDATE2 The accepted answer works way better on large dataframes and that's why I am going to stick to it as the answer. UPDATE2接受的答案在大型数据帧上运行得更好,这就是为什么我要坚持它作为答案。
Use nlargest 使用nlargest
df.loc[df.groupby('id').grade.nlargest(2).index.get_level_values(1)].sort_index()
id name grade
1 1 A 12
2 1 A 15
3 2 B 20
4 3 C 19
5 3 C 16
We can first sort the rows on name
(ascending), and grade
(descending) (sorting by name is not strictly required), then we groupby
name
, and then we get the first two rows with ( head
): 我们可以在第一行的排序
name
(升序)和grade
(降序)(按名称排序没有严格要求),那么我们groupby
name
,然后我们得到第一个两行( head
):
df.sort_values(['name', 'grade'], ascending=[True, False]).groupby('name').head(2)
This will produce: 这将产生:
>>> df.sort_values(['name', 'grade'], ascending=[True, False]).groupby('name').head(2)
grade id name
2 15 1 A
1 12 1 A
3 20 2 B
4 19 3 C
5 16 3 C
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.