remove outlier in dataframe of python

Question

would like to remove the outlier of the DataFrame using the mean and standard deviation in Python. But I want to make it na instead of simply deleting outliers. And then i want to save it again in the form of Dataframe. This is my question.

I thought about the code below, but I do not know what to do more here. I don't care if I can solve my problems in any way, if not the following way.

df_group = df.groupby('count')
df_group_mean = df_group.mean()
df_group_std = df_group.std()
index_list = df_group_mean.index
col_list = ["A", "B", "C", "D"]

for IndexList in index_list:
    temp = df.iloc[IndexList]
    
    for ColList in col_list:
        mean = df_group_mean.loc[IndexList, ColList]
        std = df_group_std.loc[IndexList, ColList]        
        temp[ColList] = np.where(temp[ColList] > mean + (std * sigma), np.nan, temp[ColList])
        temp[ColList] = np.where(temp[ColList] < mean - (std * sigma), np.nan, temp[ColList])

Answer 1

You probably need something like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({'x':[-30,-2,0,1,2,4,5,7,8,9,10,10,34]})

Label values that are 2 standard deviations beyond or below the mean as an outlier. In this example the first and last value will be turned into NAN.

df[ (df['x'] > df['x'].mean()+2*df['x'].std()) | (df['x'] < df['x'].mean()-2*df['x'].std()) ] = np.nan

remove outlier in dataframe of python

Question

1 answers

solution1
0 2020-12-14 07:55:36

remove outlier in dataframe of python

Question

1 answers

solution1 0 2020-12-14 07:55:36

solution1
0 2020-12-14 07:55:36