简体   繁体   English

Python中的逻辑根据数据帧中某列的最大值仅返回相似行中的一行

[英]Logic in Python to return only one row among similar row(s) based on the maximum value in a column in a dataframe

I am a newbie to Python and I am needing a solution for this example below, This is how my df looks like: 我是Python的新手,我需要以下示例的解决方案,这就是我的df的外观:

Index   classcode   product_id  Season Sales Score
1      65 102 00    210190062   2018_2  1000   3
2      65 102 00    210190062   2018_2  1000   5
89     66 107 00    210189987   2018_4  1500   10

I just need the one row based on min or max value of the 'Score'column from Index 1 or 2 and the row with index 89 as well. 我只需要基于索引1或2的“得分”列的最小值或最大值的一行,以及基于索引89的行。 The only different value among Index 1 and 2 is the Score which is always unique while the rest of the colunmns are identical.The Score is not the same for the same product_id or classode or any other column(s) in the df. 索引1和2之间唯一不同的值是得分,该得分始终是唯一的,而其他元素是相同的。对于相同product_id或classode或df中的任何其他列,得分都不相同。 I just want to eliminate the double counting of the Sales. 我只想消除销售的重复计算。 Is there a function or logic in pandas to achieve this? 熊猫有没有实现这一目标的功能或逻辑? I tried creating a new data frame by returning all columns and grouping them by the max of Score and it did not work. 我尝试通过返回所有列并按Score的最大值对它们进行分组来创建新的数据框,但此方法不起作用。 I have done this in SQL using window functions but not sure about what to do here. 我已经在SQL中使用窗口函数完成了此操作,但不确定在这里做什么。 The Index is the default index created from the data frame. 索引是从数据框创建的默认索引。 The expected output for the example would be like below, 该示例的预期输出如下所示,

Index   classcode   product_id  Season Sales Score
2      65 102 00    210190062   2018_2  1000   5
89     66 107 00    210189987   2018_4  1500   10

I think this should work. 我认为这应该有效。

I am just assuming your dataframe is foo 我只是假设你的数据框是foo

foo.groupby(['classcode','product_id','Season','Sales'])['Score'].max()

There several ways to do this: 有几种方法可以做到这一点:

groupby & transform groupbytransform

cols = ['classcode', 'product_id', 'Season', 'Sales']

df[df['Score'].eq(df.groupby(cols)['Score'].transform('max'))]

       classcode  product_id  Season  Sales  Score
Index                                             
2      65 102 00   210190062  2018_2   1000      5
89     66 107 00   210189987  2018_4   1500     10

sort_values & drop_duplicates sort_valuesdrop_duplicates

cols = ['classcode', 'product_id', 'Season', 'Sales','Score']

df.sort_values(cols).drop_duplicates(cols, keep='last')

       classcode  product_id  Season  Sales  Score
Index                                             
2      65 102 00   210190062  2018_2   1000      5
89     66 107 00   210189987  2018_4   1500     10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas数据帧:返回最大值的行和列 - Pandas dataframe: return row AND column of maximum value(s) 根据不同数据框行中的值返回行的索引 - Return the index of a row based on value in different dataframe's row Python数据框获取一个值的行和列 - Python dataframe get the row and column of one value 迭代 dataframe 并根据一列的值在具有前一行值的新列中执行操作 - iterrate over dataframe and based on the value of one column do operations in a new column with previous row's value python根据行值添加一列 - python add one column based on row value 基于另一个 dataframe 的行值对一个 dataframe 中的列求和 - Sum column in one dataframe based on row value of another dataframe 使用 Python 更新每一行中的最大值 dataframe 与 [具有最大值的列] 和 [列名阈值] 的总和 - Using Python Update the maximum value in each row dataframe with the sum of [column with maximum value] and [column name threshold] 在列中查找最大值并返回行号 - Finding maximum value in a column and return row number Pandas Dataframe基于前一行,将值添加到新列,但该列的最大值限于该列 - Pandas Dataframe Add a value to a new Column based on the previous row limited to the maximum value in that column 如何在具有另一列最大值的行中的一个 dataframe 列中找到值? - How do I find the value in one dataframe column in the row with the maximum value of another column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM