如何通过分组将功能应用于熊猫数据框

Question

I have this function i found on git hub. 我有在git hub上找到的此功能。

def std_div(data, threshold=3):
    std = data.std()
    mean = data.mean()
    isOutlier = []
    for val in data:
        if val/std > threshold:
            isOutlier.append(True)
        else:
            isOutlier.append(False)
    return isOutlier

I want to apply this to my dataFrame for each group(dept) 我想将此应用于每个组的我的dataFrame（部门）

     employee_id  dept            Salary
      1             sales           10000
      2             sales           110000 
      3             sales           120000
      4             hr              5000
      5             hr              6000

This works, but it calculates the std div for the entire data frame. 这可行，但它会为整个数据帧计算std div。

df["std_div"]= df.from_dict(std_div(df.Salary))

Answer 1

You could do something along the lines of the following, where you group by the column of interest then use a for loop to run the function on the column for that specific group 您可以按照以下方式进行操作，即按感兴趣的列分组，然后使用for循环在该特定组的列上运行函数

for name, group in df.groupby('dept'):
    df.loc[group.index, 'outlier'] = std_div(group.Salary)

df
employee_id dept    Salary  outlier
1           sales   10000   False
2           sales   110000  False
3           sales   120000  False
4           hr      5000    True
5           hr      6000    True

Depending on what you would like that output to be, you can assign the return values to the original dataframe 根据您希望输出的结果，可以将返回值分配给原始数据帧

如何通过分组将功能应用于熊猫数据框

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-03-22 18:46:16

如何通过分组将功能应用于熊猫数据框

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-03-22 18:46:16

解决方案1
1 已采纳 2017-03-22 18:46:16