繁体   English   中英

groupby-仅选择某些组

[英]groupby - selecting only certain groups

我有下面的DataFrame,我想选择服务少于2个“健康”实例的服务。 在这种情况下,我需要系列(EmailService,UserService,NotificationService)

              CPU              Service  Memory   Status
IP                                                     
10.22.11.150   13       StorageService      55  Healthy
10.22.11.90    23       StorageService      19  Healthy
10.22.11.91    10         EmailService      44  Healthy
10.22.11.92    69          UserService       1  Healthy
10.22.11.93    63  NotificationService      81  Healthy
10.22.11.93    87  NotificationService      98  Unhealthy

我想我需要这个分组,

grouped = servers_df.groupby('Service')

但不确定如何计算“状态”列,然后根据该结果获取结果。

使用带有lambda函数的transform来进行Healthy计数和比较,最后通过boolean indexing进行过滤:

df = df[df.groupby('Service')['Status'].transform(lambda x: (x=='Healthy').sum() < 2)]
print (df)
             CPU              Service  Memory     Status
IP                                                      
10.22.11.91   10         EmailService      44    Healthy
10.22.11.92   69          UserService       1    Healthy
10.22.11.93   63  NotificationService      81    Healthy
10.22.11.93   87  NotificationService      98  Unhealthy

如果想查询只值1 Healthy每组使用duplicatedkeep=False所有受骗者与条件IT连锁的比较Healthy的滤出多种Unhealthy ,然后通过反转条件~和过滤boolean indexing再次:

df = df[~(df.duplicated(['Service','Status'], keep=False) & (df['Status'] == 'Healthy'))]

您也可以使用filter

df.groupby("Service").filter(lambda x: len(x[x.Status == "Healthy"]) < 2)

根据jezrael的实验,该速度可能会变慢

另一种方法:使用apply (从jezrael修改的转换解决方案)

df.groupby('Service').apply(
                   lambda x: x if (x.Status == 'Healthy').sum() < 2 else None)


                        IP         CPU  Service              Memory Status
Service                     
EmailService        2   10.22.11.91 10  EmailService         44 Healthy
NotificationService 4   10.22.11.93 63  NotificationService  81 Healthy
                    5   10.22.11.93 87  NotificationService  98 Unhealthy
UserService         3   10.22.11.92 69  UserService          1  Healthy

IIUC

s=df[df.Status=='Healthy'].groupby('Service').Service.count().lt(2)
df.loc[df.Service.isin(s[s].index)]

    IP          CPU Service             Memory  Status
2   10.22.11.91 10  EmailService        44      Healthy
3   10.22.11.92 69  UserService         1       Healthy
4   10.22.11.93 63  NotificationService 81      Healthy
5   10.22.11.93 87  NotificationService 98      Unhealthy

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM