简体   繁体   English

Python:过滤pandas数据帧以保持基于列的指定行数

[英]Python: filter pandas dataframe to keep specified number of rows based on a column

I have a pandas dataframe which looks like: 我有一个pandas数据框,看起来像:

id    name    grade
1     A       10
1     A       12
1     A       15
2     B       20
3     C       19
3     C       16
3     C       11

And need to make it look like: 并且需要使它看起来像:

id    name    grade
1     A       12
1     A       15
2     B       20
3     C       19
3     C       16

In this case I need to keep top 2 rows for each id with highest grades. 在这种情况下,我需要为每个id保持前2行,并且成绩最高。 I know I could use iloc and iterate through the dataframe but am wondering if there is a more pythonic way of doing this. 我知道我可以使用iloc并遍历数据帧,但我想知道是否有更多的pythonic方式来做到这一点。 Is this possible at all? 这有可能吗? Thanks in advance 提前致谢

Btw, feel free to edit the question and give it a better title if you have anything in mind. 顺便说一句,如果您有任何想法,请随时编辑问题并给它一个更好的标题。

UPDATE1 I have accepted @willem-van-onsem 's answer since it was posted first and works fine for me. UPDATE1我已经接受@ willem-van-onsem的答案,因为它首先发布并且对我来说很好。 The other answer works good as well. 另一个答案也很好。 I am not sure about each answer's performance so if for any reason you think the other one might be more suitable please leave a comment here so I update the answer and the post as well for others. 我不确定每个答案的表现,所以如果出于任何原因你认为另一个可能更合适,请在这里发表评论,以便我更新答案和帖子以及其他人。

UPDATE2 The accepted answer works way better on large dataframes and that's why I am going to stick to it as the answer. UPDATE2接受的答案在大型数据帧上运行得更好,这就是为什么我要坚持它作为答案。

Use nlargest 使用nlargest

df.loc[df.groupby('id').grade.nlargest(2).index.get_level_values(1)].sort_index()

    id  name    grade
1   1   A       12
2   1   A       15
3   2   B       20
4   3   C       19
5   3   C       16

We can first sort the rows on name (ascending), and grade (descending) (sorting by name is not strictly required), then we groupby name , and then we get the first two rows with ( head ): 我们可以在第一行的排序name (升序)和grade (降序)(按名称排序没有严格要求),那么我们groupby name ,然后我们得到第一个两行( head ):

df.sort_values(['name', 'grade'], ascending=[True, False]).groupby('name').head(2)

This will produce: 这将产生:

>>> df.sort_values(['name', 'grade'], ascending=[True, False]).groupby('name').head(2)
   grade  id name
2     15   1    A
1     12   1    A
3     20   2    B
4     19   3    C
5     16   3    C

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据一列过滤熊猫数据框:保留所有行(如果值是该列) - Filter pandas dataframe based on a column: keep all rows if a value is that column 过滤分组的熊猫数据框,保留列中具有最小值的所有行 - Filter grouped pandas dataframe, keep all rows with minimum value in column 根据定义pandas中的类别的列过滤掉没有足够观察次数的DataFrame行 - Filter out DataFrame rows that have insufficient number of observations based on a column defining a category in pandas 根据行和列条件保留 pandas dataframe 的行 - Keep rows of a pandas dataframe based on both row and column conditions Python/Pandas:基于另一个 dataframe 过滤和组织 dataframe 的行和列 - Python/Pandas: filter and organize the rows and columns of a dataframe based on another dataframe Python Dataframe Pandas - 过滤 Z6A8064B5DF4794555500553C47C55055DZ 行的条件子集 - Python Dataframe Pandas - Filter dataframe rows by condition issubset() on column values 根据列的最大值过滤 Pandas dataframe 中的行 - Filter rows from Pandas dataframe based on max value of a column 按指定间隔过滤pandas数据框中的行 - filter rows in pandas dataframe with specified interval Pandas Dataframe - 根据具有条件的列上的累积总和记录行数 - Pandas Dataframe - record number of rows based on cumulative sum on a column with a condition 如何基于列值比较在python中过滤Pandas数据框? - How to filter a Pandas dataframe in python based on column value comparison?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM