would like to remove the outlier of the DataFrame using the mean and standard deviation in Python. But I want to make it na instead of simply deleting outliers. And then i want to save it again in the form of Dataframe. This is my question.
I thought about the code below, but I do not know what to do more here. I don't care if I can solve my problems in any way, if not the following way.
df_group = df.groupby('count')
df_group_mean = df_group.mean()
df_group_std = df_group.std()
index_list = df_group_mean.index
col_list = ["A", "B", "C", "D"]
for IndexList in index_list:
temp = df.iloc[IndexList]
for ColList in col_list:
mean = df_group_mean.loc[IndexList, ColList]
std = df_group_std.loc[IndexList, ColList]
temp[ColList] = np.where(temp[ColList] > mean + (std * sigma), np.nan, temp[ColList])
temp[ColList] = np.where(temp[ColList] < mean - (std * sigma), np.nan, temp[ColList])
You probably need something like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'x':[-30,-2,0,1,2,4,5,7,8,9,10,10,34]})
Label values that are 2 standard deviations beyond or below the mean as an outlier. In this example the first and last value will be turned into NAN.
df[ (df['x'] > df['x'].mean()+2*df['x'].std()) | (df['x'] < df['x'].mean()-2*df['x'].std()) ] = np.nan
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.