In the following dataframe I want to replace the outliers in the EMI column with the mode of the group. Here's sample data.
Id | C_Id | EMI |
---|---|---|
1 | 1000 | 141 |
2 | 1000 | 141 |
3 | 1000 | 21538 |
4 | 2000 | 313 |
5 | 2000 | 313 |
6 | 2000 | 31528 |
7 | 3000 | 0 |
8 | 3000 | 0 |
9 | 3000 | 3000 |
10 | 3000 | 4000 |
I am expecting the output to be like this.
Id | C_Id | EMI |
---|---|---|
1 | 1000 | 141 |
2 | 1000 | 141 |
3 | 1000 | 141 |
4 | 2000 | 313 |
5 | 2000 | 313 |
6 | 2000 | 313 |
7 | 3000 | 0 |
8 | 3000 | 0 |
9 | 3000 | 0 |
10 | 3000 | 0 |
First step is to have modes calculated:
from scipy import stats
modes = df.groupby('C_Id').agg({'EMI':lambda x:stats.mode(x)[0]}).reset_index()
modes
Which will give you:
C_Id | EMI | |
---|---|---|
0 | 1000 | 141 |
1 | 2000 | 313 |
2 | 3000 | 0 |
Then it depends on your definition of "outlier". If you simply meant outliers be a value different than mode, its simply:
df.drop(columns = ['EMI']).merge(modes, on=['C_Id'])
Id | C_Id | EMI | |
---|---|---|---|
0 | 1 | 1000 | 141 |
1 | 2 | 1000 | 141 |
2 | 3 | 1000 | 141 |
3 | 4 | 2000 | 313 |
4 | 5 | 2000 | 313 |
5 | 6 | 2000 | 313 |
6 | 7 | 3000 | 0 |
7 | 8 | 3000 | 0 |
8 | 9 | 3000 | 0 |
9 | 10 | 3000 | 0 |
however if you have some criteria you can do:
merged = df.merge(modes, on=['C_Id'], suffixes=['', '_y'])
merged['replacement'] = merged.EMI.gt(merged.EMI_y*10) # use your criteria of outlier here
merged.loc[merged.replacement,'EMI'] = merged.loc[merged.replacement,'EMI_y']
merged.drop(columns=['EMI_y', 'replacement'])
Which will still give the same output for your example usecase however its comparisons will be based on the criteria you set:
Id | C_Id | EMI | |
---|---|---|---|
0 | 1 | 1000 | 141 |
1 | 2 | 1000 | 141 |
2 | 3 | 1000 | 141 |
3 | 4 | 2000 | 313 |
4 | 5 | 2000 | 313 |
5 | 6 | 2000 | 313 |
6 | 7 | 3000 | 0 |
7 | 8 | 3000 | 0 |
8 | 9 | 3000 | 0 |
9 | 10 | 3000 | 0 |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.