Replace the Outlier in a group with the mean of the group in a pandas series

Question

In the following dataframe I want to replace the outliers in the EMI column with the mode of the group. Here's sample data.

Id	C_Id	EMI
1	1000	141
2	1000	141
3	1000	21538
4	2000	313
5	2000	313
6	2000	31528
7	3000	0
8	3000	0
9	3000	3000
10	3000	4000

I am expecting the output to be like this.

Id	C_Id	EMI
1	1000	141
2	1000	141
3	1000	141
4	2000	313
5	2000	313
6	2000	313
7	3000	0
8	3000	0
9	3000	0
10	3000	0

Answer 1

First step is to have modes calculated:

from scipy import stats
modes = df.groupby('C_Id').agg({'EMI':lambda x:stats.mode(x)[0]}).reset_index()
modes

Which will give you:

	C_Id	EMI
0	1000	141
1	2000	313
2	3000	0

Then it depends on your definition of "outlier". If you simply meant outliers be a value different than mode, its simply:

df.drop(columns = ['EMI']).merge(modes, on=['C_Id'])

	Id	C_Id	EMI
0	1	1000	141
1	2	1000	141
2	3	1000	141
3	4	2000	313
4	5	2000	313
5	6	2000	313
6	7	3000	0
7	8	3000	0
8	9	3000	0
9	10	3000	0

however if you have some criteria you can do:

merged = df.merge(modes, on=['C_Id'], suffixes=['', '_y'])
merged['replacement'] = merged.EMI.gt(merged.EMI_y*10) # use your criteria of outlier here
merged.loc[merged.replacement,'EMI'] = merged.loc[merged.replacement,'EMI_y']
merged.drop(columns=['EMI_y', 'replacement'])

Which will still give the same output for your example usecase however its comparisons will be based on the criteria you set:

	Id	C_Id	EMI
0	1	1000	141
1	2	1000	141
2	3	1000	141
3	4	2000	313
4	5	2000	313
5	6	2000	313
6	7	3000	0
7	8	3000	0
8	9	3000	0
9	10	3000	0

Replace the Outlier in a group with the mean of the group in a pandas series

Question

1 answers

solution1
0 2022-06-19 10:40:16

Replace the Outlier in a group with the mean of the group in a pandas series

Question

1 answers

solution1 0 2022-06-19 10:40:16

solution1
0 2022-06-19 10:40:16