[英]Pandas Groupby apply function to group
I have this function:我有这个功能:
def is_outlier(points, thresh=3.5):
if len(points.shape) == 1:
points = points[:, None]
median = np.nanmedian(points, axis=0)
diff = np.sum((points - median)**2, axis=-1)
diff = np.sqrt(diff)
med_abs_deviation = np.nanmedian(diff)
modified_z_score = 0.6745 * (diff / med_abs_deviation)
return modified_z_score > thresh
I want to groupby the modality column and select the other 3 columns and create a new column with a true or false result from the above function that identifies the outliers.我想对模态列进行分组并选择其他 3 列并创建一个新列,其中包含来自上述函数的真或假结果,用于识别异常值。
Data:数据:
MODALITY COMP_FINAL_TAT ORD_FINAL_TAT UNREAD_FINAL_TAT ORD_UNREAD_TAT
0 MRI 12 394 5 389
1 CT 233 240 229 11
2 CT 204 205 188 16
3 RAD 245 302 243 59
4 RAD 240 297 238 59
5 RAD 234 291 232 59
6 RAD 236 294 235 59
7 MRI 170 -10 63 -73
8 RAD 239 296 237 59
9 RAD 251 256 251 4
10 RAD 147 176 146 29
11 MRI 25 -62 18 -80
12 MRI 527 -482 518 -1000
13 RAD 151 231 150 81
I have thought to do this: outlierdf = df.groupby(['MODALITY'])['COMP_FINAL_TAT','ORD_FINAL_TAT','UNREAD_FINAL_TAT','ORD_UNREAD_TAT].transform(is_outlier)我想这样做: outlierdf = df.groupby(['MODALITY'])['COMP_FINAL_TAT','ORD_FINAL_TAT','UNREAD_FINAL_TAT','ORD_UNREAD_TAT].transform(is_outlier)
I can't seem to work out how to add the true/false outlier result as a new column.我似乎无法弄清楚如何将真/假异常值结果添加为新列。
Use DataFrame.join
with DataFrame.add_suffix
to create four new columns with outliers for each of the 4 selected columns.使用
DataFrame.join
和DataFrame.add_suffix
为 4 个选定列中的每一个创建四个带有异常值的新列。
df = df.join( df.groupby(['MODALITY'])['COMP_FINAL_TAT', 'ORD_FINAL_TAT',
'UNREAD_FINAL_TAT', 'ORD_UNREAD_TAT']
.transform(is_outlier).add_suffix('_outlier'))
print(df)
MODALITY COMP_FINAL_TAT ORD_FINAL_TAT UNREAD_FINAL_TAT ORD_UNREAD_TAT \
0 MRI 12 394 5 389
1 CT 233 240 229 11
2 CT 204 205 188 16
3 RAD 245 302 243 59
4 RAD 240 297 238 59
5 RAD 234 291 232 59
6 RAD 236 294 235 59
7 MRI 170 -10 63 -73
8 RAD 239 296 237 59
9 RAD 251 256 251 4
10 RAD 147 176 146 29
11 MRI 25 -62 18 -80
12 MRI 527 -482 518 -1000
13 RAD 151 231 150 81
COMP_FINAL_TAT_outlier ORD_FINAL_TAT_outlier UNREAD_FINAL_TAT_outlier \
0 False False False
1 False False False
2 False False False
3 False False False
4 False False False
5 False False False
6 False False False
7 False False False
8 False False False
9 False True False
10 True True True
11 False False False
12 True False True
13 True True True
ORD_UNREAD_TAT_outlier
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 True
10 True
11 False
12 False
13 True
If you want True
if any value is True in a row use DataFrame.any
如果你想要
True
如果任何值在一行中为True使用DataFrame.any
df = df.join( df.groupby(['MODALITY'])['COMP_FINAL_TAT', 'ORD_FINAL_TAT',
'UNREAD_FINAL_TAT', 'ORD_UNREAD_TAT']
.transform(is_outlier).any(axis=1).rename('outlier'))
print(df)
MODALITY COMP_FINAL_TAT ORD_FINAL_TAT UNREAD_FINAL_TAT ORD_UNREAD_TAT \
0 MRI 12 394 5 389
1 CT 233 240 229 11
2 CT 204 205 188 16
3 RAD 245 302 243 59
4 RAD 240 297 238 59
5 RAD 234 291 232 59
6 RAD 236 294 235 59
7 MRI 170 -10 63 -73
8 RAD 239 296 237 59
9 RAD 251 256 251 4
10 RAD 147 176 146 29
11 MRI 25 -62 18 -80
12 MRI 527 -482 518 -1000
13 RAD 151 231 150 81
outlier
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 True
10 True
11 False
12 True
13 True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.