[英]How to apply function to panda dataframe with group by
I have this function i found on git hub. 我有在git hub上找到的此功能。
def std_div(data, threshold=3):
std = data.std()
mean = data.mean()
isOutlier = []
for val in data:
if val/std > threshold:
isOutlier.append(True)
else:
isOutlier.append(False)
return isOutlier
I want to apply this to my dataFrame for each group(dept) 我想将此应用于每个组的我的dataFrame(部门)
employee_id dept Salary
1 sales 10000
2 sales 110000
3 sales 120000
4 hr 5000
5 hr 6000
This works, but it calculates the std div for the entire data frame. 这可行,但它会为整个数据帧计算std div。
df["std_div"]= df.from_dict(std_div(df.Salary))
You could do something along the lines of the following, where you group by the column of interest then use a for loop to run the function on the column for that specific group 您可以按照以下方式进行操作,即按感兴趣的列分组,然后使用for循环在该特定组的列上运行函数
for name, group in df.groupby('dept'):
df.loc[group.index, 'outlier'] = std_div(group.Salary)
df
employee_id dept Salary outlier
1 sales 10000 False
2 sales 110000 False
3 sales 120000 False
4 hr 5000 True
5 hr 6000 True
Depending on what you would like that output to be, you can assign the return values to the original dataframe 根据您希望输出的结果,可以将返回值分配给原始数据帧
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.