I have this function i found on git hub.
def std_div(data, threshold=3):
std = data.std()
mean = data.mean()
isOutlier = []
for val in data:
if val/std > threshold:
isOutlier.append(True)
else:
isOutlier.append(False)
return isOutlier
I want to apply this to my dataFrame for each group(dept)
employee_id dept Salary
1 sales 10000
2 sales 110000
3 sales 120000
4 hr 5000
5 hr 6000
This works, but it calculates the std div for the entire data frame.
df["std_div"]= df.from_dict(std_div(df.Salary))
You could do something along the lines of the following, where you group by the column of interest then use a for loop to run the function on the column for that specific group
for name, group in df.groupby('dept'):
df.loc[group.index, 'outlier'] = std_div(group.Salary)
df
employee_id dept Salary outlier
1 sales 10000 False
2 sales 110000 False
3 sales 120000 False
4 hr 5000 True
5 hr 6000 True
Depending on what you would like that output to be, you can assign the return values to the original dataframe
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.