Python中的逻辑根据数据帧中某列的最大值仅返回相似行中的一行

Question

我是Python的新手，我需要以下示例的解决方案，这就是我的df的外观：

Index   classcode   product_id  Season Sales Score
1      65 102 00    210190062   2018_2  1000   3
2      65 102 00    210190062   2018_2  1000   5
89     66 107 00    210189987   2018_4  1500   10

我只需要基于索引1或2的“得分”列的最小值或最大值的一行，以及基于索引89的行。 索引1和2之间唯一不同的值是得分，该得分始终是唯一的，而其他元素是相同的。对于相同product_id或classode或df中的任何其他列，得分都不相同。 我只想消除销售的重复计算。 熊猫有没有实现这一目标的功能或逻辑？ 我尝试通过返回所有列并按Score的最大值对它们进行分组来创建新的数据框，但此方法不起作用。 我已经在SQL中使用窗口函数完成了此操作，但不确定在这里做什么。 索引是从数据框创建的默认索引。 该示例的预期输出如下所示，

Index   classcode   product_id  Season Sales Score
2      65 102 00    210190062   2018_2  1000   5
89     66 107 00    210189987   2018_4  1500   10

Answer 1

我认为这应该有效。

我只是假设你的数据框是foo

foo.groupby(['classcode','product_id','Season','Sales'])['Score'].max()

Answer 2

有几种方法可以做到这一点：

`groupby`和`transform`

cols = ['classcode', 'product_id', 'Season', 'Sales']

df[df['Score'].eq(df.groupby(cols)['Score'].transform('max'))]

       classcode  product_id  Season  Sales  Score
Index                                             
2      65 102 00   210190062  2018_2   1000      5
89     66 107 00   210189987  2018_4   1500     10

`sort_values`和`drop_duplicates`

cols = ['classcode', 'product_id', 'Season', 'Sales','Score']

df.sort_values(cols).drop_duplicates(cols, keep='last')

       classcode  product_id  Season  Sales  Score
Index                                             
2      65 102 00   210190062  2018_2   1000      5
89     66 107 00   210189987  2018_4   1500     10

Python中的逻辑根据数据帧中某列的最大值仅返回相似行中的一行

问题描述

2 个解决方案

解决方案1
2 2019-05-20 23:49:56

解决方案2
1 2019-05-20 23:52:48

`groupby`和`transform`

`sort_values`和`drop_duplicates`

Python中的逻辑根据数据帧中某列的最大值仅返回相似行中的一行

问题描述

2 个解决方案

解决方案1 2 2019-05-20 23:49:56

解决方案2 1 2019-05-20 23:52:48

groupby和transform

sort_values和drop_duplicates

解决方案1
2 2019-05-20 23:49:56

解决方案2
1 2019-05-20 23:52:48

`groupby`和`transform`

`sort_values`和`drop_duplicates`