How to apply function to panda dataframe with group by

Question

I have this function i found on git hub.

def std_div(data, threshold=3):
    std = data.std()
    mean = data.mean()
    isOutlier = []
    for val in data:
        if val/std > threshold:
            isOutlier.append(True)
        else:
            isOutlier.append(False)
    return isOutlier

I want to apply this to my dataFrame for each group(dept)

     employee_id  dept            Salary
      1             sales           10000
      2             sales           110000 
      3             sales           120000
      4             hr              5000
      5             hr              6000

This works, but it calculates the std div for the entire data frame.

df["std_div"]= df.from_dict(std_div(df.Salary))

Answer 1

You could do something along the lines of the following, where you group by the column of interest then use a for loop to run the function on the column for that specific group

for name, group in df.groupby('dept'):
    df.loc[group.index, 'outlier'] = std_div(group.Salary)

df
employee_id dept    Salary  outlier
1           sales   10000   False
2           sales   110000  False
3           sales   120000  False
4           hr      5000    True
5           hr      6000    True

Depending on what you would like that output to be, you can assign the return values to the original dataframe

How to apply function to panda dataframe with group by

Question

1 answers

solution1
1 ACCPTED 2017-03-22 18:46:16

How to apply function to panda dataframe with group by

Question

1 answers

solution1 1 ACCPTED 2017-03-22 18:46:16

solution1
1 ACCPTED 2017-03-22 18:46:16