简体   繁体   中英

Drop Values from Pandas Dataframe Groups of a Column keeping 1 STD from mean of Groups

on a Pandas df I want to drop rows on a column when its individual value is more or less 1 std from the mean of the group.

For instance, I have a list of names related to an state, and I want to drop every instance that is above or below 1 std of price of the state.

thx.

#df
state price
a       10
a       30
a       60
b       60
b       50
...
n       x


stats = df.groupby('state')['price'].describe()


edit: thanks @MYousefi

but look my output, i still can see outliers on the second graph

Ans1

Edit2: problem solved with @MYousefi link below

One way to do it is to calculate the deviation from the mean and select.

df = pd.DataFrame([['a', 10], ['a', 30], ['a', 60], ['b', 10], ['b', 50], ['b', 60]], columns = ['state', 'price'])

agg = df.groupby('state')['price'].agg(['mean', 'std'])

df[((df[['state', 'price']].set_index('state')['price'] - agg['mean']).abs() / agg['std']).reset_index(drop=True) <= 1]

The output of the last statement should be:

  state  price
0     a     10
1     a     30
4     b     50
5     b     60

Also found Pandas filter anomalies per group by Zscore which is the same thing I believe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM