Pandas 分组、过滤和聚合

Question

我在熊猫中有以下数据框

employee_name   age location    salary
Harish          31  Mumbai      450000
Marina          30  Mumbai      600000
Meena           31  Pune        750000
Sachin          32  Mumbai      1200000
Tarun           27  Mumbai      1400000
Mahesh          41  Pune        1500000
Satish          42  Delhi       650000
Heena           34  Delhi       800000

我想从这个数据框中得到的是年龄组中的所有员工 > 30 & < 35 在所有不同地点赚取最高工资

我想要的数据框是

employee_name       age     location     salary
Sachin              32      Mumbai       1200000
Meena               31      Pune         750000
Heena               34      Delhi        800000

我正在熊猫中进行跟踪，但它给出了一个错误

df.groupby('location').filter(lambda x : (x['age'] > 30) & (x['age'] < 35))['salary'].max()

我如何在熊猫中做到这一点？

Answer 1

您可以先过滤，然后找到具有最大值的行：

(df.loc[df['age'].between(31,34)]
   .sort_values('salary')
   .drop_duplicates('location', keep='last')
)

输出：

  employee_name  age location   salary
2         Meena   31     Pune   750000
7         Heena   34    Delhi   800000
3        Sachin   32   Mumbai  1200000

Answer 2

尝试使用idxmax ，注意这里的过滤器不起作用

df.loc[df[df['age'].between(31,34)].groupby('location')['salary'].idxmax()]
Out[110]: 
  employee_name  age location   salary
7         Heena   34    Delhi   800000
3        Sachin   32   Mumbai  1200000
2         Meena   31     Pune   750000

Answer 3

你可以试试这个选项：

df = df.query('age > 30 & age < 35')
df = df.drop_duplicates(subset="age", keep="last")
print(df)

Pandas 分组、过滤和聚合

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-08-27 13:41:00

解决方案2
1 2020-08-27 14:01:55

解决方案3
0 2020-08-27 14:13:20

Pandas 分组、过滤和聚合

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-08-27 13:41:00

解决方案2 1 2020-08-27 14:01:55

解决方案3 0 2020-08-27 14:13:20

解决方案1
2 已采纳 2020-08-27 13:41:00

解决方案2
1 2020-08-27 14:01:55

解决方案3
0 2020-08-27 14:13:20