groupby-仅选择某些组

Question

我有下面的DataFrame，我想选择服务少于2个“健康”实例的服务。 在这种情况下，我需要系列（EmailService，UserService，NotificationService）

              CPU              Service  Memory   Status
IP                                                     
10.22.11.150   13       StorageService      55  Healthy
10.22.11.90    23       StorageService      19  Healthy
10.22.11.91    10         EmailService      44  Healthy
10.22.11.92    69          UserService       1  Healthy
10.22.11.93    63  NotificationService      81  Healthy
10.22.11.93    87  NotificationService      98  Unhealthy

我想我需要这个分组，

grouped = servers_df.groupby('Service')

但不确定如何计算“状态”列，然后根据该结果获取结果。

Answer 1

使用带有lambda函数的transform来进行Healthy计数和比较，最后通过boolean indexing进行过滤：

df = df[df.groupby('Service')['Status'].transform(lambda x: (x=='Healthy').sum() < 2)]
print (df)
             CPU              Service  Memory     Status
IP                                                      
10.22.11.91   10         EmailService      44    Healthy
10.22.11.92   69          UserService       1    Healthy
10.22.11.93   63  NotificationService      81    Healthy
10.22.11.93   87  NotificationService      98  Unhealthy

如果想查询只值1 Healthy每组使用duplicated与keep=False所有受骗者与条件IT连锁的比较Healthy的滤出多种Unhealthy ，然后通过反转条件~和过滤boolean indexing再次：

df = df[~(df.duplicated(['Service','Status'], keep=False) & (df['Status'] == 'Healthy'))]

Answer 2

您也可以使用filter 。

df.groupby("Service").filter(lambda x: len(x[x.Status == "Healthy"]) < 2)

根据jezrael的实验，该速度可能会变慢

另一种方法：使用apply （从jezrael修改的转换解决方案）

df.groupby('Service').apply(
                   lambda x: x if (x.Status == 'Healthy').sum() < 2 else None)


                        IP         CPU  Service              Memory Status
Service                     
EmailService        2   10.22.11.91 10  EmailService         44 Healthy
NotificationService 4   10.22.11.93 63  NotificationService  81 Healthy
                    5   10.22.11.93 87  NotificationService  98 Unhealthy
UserService         3   10.22.11.92 69  UserService          1  Healthy

Answer 3

IIUC

s=df[df.Status=='Healthy'].groupby('Service').Service.count().lt(2)
df.loc[df.Service.isin(s[s].index)]

    IP          CPU Service             Memory  Status
2   10.22.11.91 10  EmailService        44      Healthy
3   10.22.11.92 69  UserService         1       Healthy
4   10.22.11.93 63  NotificationService 81      Healthy
5   10.22.11.93 87  NotificationService 98      Unhealthy

groupby-仅选择某些组

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-02-05 21:25:22

解决方案2
1 2018-02-05 21:27:00

解决方案3
1 2018-02-05 21:33:36

groupby-仅选择某些组

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-02-05 21:25:22

解决方案2 1 2018-02-05 21:27:00

解决方案3 1 2018-02-05 21:33:36

解决方案1
3 已采纳 2018-02-05 21:25:22

解决方案2
1 2018-02-05 21:27:00

解决方案3
1 2018-02-05 21:33:36