简体   繁体   English

Pandas Groupby 将函数应用于组

[英]Pandas Groupby apply function to group

I have this function:我有这个功能:

def is_outlier(points, thresh=3.5):
    if len(points.shape) == 1:
        points = points[:, None]
        median = np.nanmedian(points, axis=0)
        diff = np.sum((points - median)**2, axis=-1)
        diff = np.sqrt(diff)
        med_abs_deviation = np.nanmedian(diff)

        modified_z_score = 0.6745 * (diff / med_abs_deviation)

        return modified_z_score > thresh

I want to groupby the modality column and select the other 3 columns and create a new column with a true or false result from the above function that identifies the outliers.我想对模态列进行分组并选择其他 3 列并创建一个新列,其中包含来自上述函数的真或假结果,用于识别异常值。

Data:数据:

    MODALITY    COMP_FINAL_TAT  ORD_FINAL_TAT   UNREAD_FINAL_TAT    ORD_UNREAD_TAT
0   MRI         12  394 5   389
1   CT          233 240 229 11
2   CT          204 205 188 16
3   RAD         245 302 243 59
4   RAD         240 297 238 59
5   RAD         234 291 232 59
6   RAD         236 294 235 59
7   MRI         170 -10 63  -73
8   RAD         239 296 237 59
9   RAD         251 256 251 4
10  RAD         147 176 146 29
11  MRI         25  -62 18  -80
12  MRI         527 -482    518 -1000
13  RAD         151 231 150 81

I have thought to do this: outlierdf = df.groupby(['MODALITY'])['COMP_FINAL_TAT','ORD_FINAL_TAT','UNREAD_FINAL_TAT','ORD_UNREAD_TAT].transform(is_outlier)我想这样做: outlierdf = df.groupby(['MODALITY'])['COMP_FINAL_TAT','ORD_FINAL_TAT','UNREAD_FINAL_TAT','ORD_UNREAD_TAT].transform(is_outlier)

I can't seem to work out how to add the true/false outlier result as a new column.我似乎无法弄清楚如何将真/假异常值结果添加为新列。

Use DataFrame.join with DataFrame.add_suffix to create four new columns with outliers for each of the 4 selected columns.使用DataFrame.joinDataFrame.add_suffix为 4 个选定列中的每一个创建四个带有异常值的新列。

df = df.join( df.groupby(['MODALITY'])['COMP_FINAL_TAT', 'ORD_FINAL_TAT',
                                     'UNREAD_FINAL_TAT', 'ORD_UNREAD_TAT']
               .transform(is_outlier).add_suffix('_outlier'))
print(df)
   MODALITY  COMP_FINAL_TAT  ORD_FINAL_TAT  UNREAD_FINAL_TAT  ORD_UNREAD_TAT  \
0       MRI              12            394                 5             389   
1        CT             233            240               229              11   
2        CT             204            205               188              16   
3       RAD             245            302               243              59   
4       RAD             240            297               238              59   
5       RAD             234            291               232              59   
6       RAD             236            294               235              59   
7       MRI             170            -10                63             -73   
8       RAD             239            296               237              59   
9       RAD             251            256               251               4   
10      RAD             147            176               146              29   
11      MRI              25            -62                18             -80   
12      MRI             527           -482               518           -1000   
13      RAD             151            231               150              81   

   COMP_FINAL_TAT_outlier ORD_FINAL_TAT_outlier  UNREAD_FINAL_TAT_outlier  \
0                   False                 False                    False   
1                   False                 False                    False   
2                   False                 False                    False   
3                   False                 False                    False   
4                   False                 False                    False   
5                   False                 False                    False   
6                   False                 False                    False   
7                   False                 False                    False   
8                   False                 False                    False   
9                   False                  True                    False   
10                   True                  True                     True   
11                  False                 False                    False   
12                   True                 False                     True   
13                   True                  True                     True   

   ORD_UNREAD_TAT_outlier  
0                   False  
1                   False  
2                   False  
3                   False  
4                   False  
5                   False  
6                   False  
7                   False  
8                   False  
9                    True  
10                   True  
11                  False  
12                  False  
13                   True  

If you want True if any value is True in a row use DataFrame.any如果你想要True如果任何值在一行中为True使用DataFrame.any

df = df.join( df.groupby(['MODALITY'])['COMP_FINAL_TAT', 'ORD_FINAL_TAT',
                                     'UNREAD_FINAL_TAT', 'ORD_UNREAD_TAT']
               .transform(is_outlier).any(axis=1).rename('outlier'))
print(df)


   MODALITY  COMP_FINAL_TAT  ORD_FINAL_TAT  UNREAD_FINAL_TAT  ORD_UNREAD_TAT  \
0       MRI              12            394                 5             389   
1        CT             233            240               229              11   
2        CT             204            205               188              16   
3       RAD             245            302               243              59   
4       RAD             240            297               238              59   
5       RAD             234            291               232              59   
6       RAD             236            294               235              59   
7       MRI             170            -10                63             -73   
8       RAD             239            296               237              59   
9       RAD             251            256               251               4   
10      RAD             147            176               146              29   
11      MRI              25            -62                18             -80   
12      MRI             527           -482               518           -1000   
13      RAD             151            231               150              81   

    outlier  
0     False  
1     False  
2     False  
3     False  
4     False  
5     False  
6     False  
7     False  
8     False  
9      True  
10     True  
11    False  
12     True  
13     True  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM