简体   繁体   中英

How to apply function to panda dataframe with group by

I have this function i found on git hub.

def std_div(data, threshold=3):
    std = data.std()
    mean = data.mean()
    isOutlier = []
    for val in data:
        if val/std > threshold:
            isOutlier.append(True)
        else:
            isOutlier.append(False)
    return isOutlier

I want to apply this to my dataFrame for each group(dept)

     employee_id  dept            Salary
      1             sales           10000
      2             sales           110000 
      3             sales           120000
      4             hr              5000
      5             hr              6000 

This works, but it calculates the std div for the entire data frame.

df["std_div"]= df.from_dict(std_div(df.Salary))

You could do something along the lines of the following, where you group by the column of interest then use a for loop to run the function on the column for that specific group

for name, group in df.groupby('dept'):
    df.loc[group.index, 'outlier'] = std_div(group.Salary)

df
employee_id dept    Salary  outlier
1           sales   10000   False
2           sales   110000  False
3           sales   120000  False
4           hr      5000    True
5           hr      6000    True

Depending on what you would like that output to be, you can assign the return values to the original dataframe

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM