简体   繁体   中英

How to find outliers within groups in a dataframe

I have a df which looks like the following:

Group. Score.
red 34
blue 42
green 1000
green 34
blue 34
red 42

I would like to add a column onto this which specifies if the value is an outlier. If there were no groups then I would use something like:

df['outliers'] = df[df[col] > df[col].mean() + 3 * df[col].std()]

But how would I do this so it is within the groups?

You can use GroupBy.transform :

df["is_outlier"] = df.groupby("Group.").transform(lambda x: (x - x.mean()).abs() > 3*x.std())

In each group, we take the distance of elements from the group mean and see if its absolute value exceeds 3 times std of the group.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM