将组中的异常值替换为 pandas 系列中组的平均值

Question

In the following dataframe I want to replace the outliers in the EMI column with the mode of the group.在下面的dataframe中，我想用组的模式替换EMI列中的异常值。 Here's sample data.这是示例数据。

Id ID	C_Id C_Id	EMI电磁干扰
1 1个	1000 1000	141 141
2 2个	1000 1000	141 141
3 3个	1000 1000	21538 21538
4 4个	2000 2000	313 313
5 5个	2000 2000	313 313
6 6个	2000 2000	31528 31528
7 7	3000 3000	0 0
8 8个	3000 3000	0 0
9 9	3000 3000	3000 3000
10 10	3000 3000	4000 4000

I am expecting the output to be like this.我期待 output 是这样的。

Id ID	C_Id C_Id	EMI电磁干扰
1 1个	1000 1000	141 141
2 2个	1000 1000	141 141
3 3个	1000 1000	141 141
4 4个	2000 2000	313 313
5 5个	2000 2000	313 313
6 6个	2000 2000	313 313
7 7	3000 3000	0 0
8 8个	3000 3000	0 0
9 9	3000 3000	0 0
10 10	3000 3000	0 0

Answer 1

First step is to have modes calculated:第一步是计算模式：

from scipy import stats
modes = df.groupby('C_Id').agg({'EMI':lambda x:stats.mode(x)[0]}).reset_index()
modes

Which will give you:这会给你：

	C_Id C_Id	EMI电磁干扰
0 0	1000 1000	141 141
1 1个	2000 2000	313 313
2 2个	3000 3000	0 0

Then it depends on your definition of "outlier".那么就看你对“离群值”的定义了。 If you simply meant outliers be a value different than mode, its simply:如果您只是意味着离群值是不同于模式的值，那么它很简单：

df.drop(columns = ['EMI']).merge(modes, on=['C_Id'])

	Id ID	C_Id C_Id	EMI电磁干扰
0 0	1 1个	1000 1000	141 141
1 1个	2 2个	1000 1000	141 141
2 2个	3 3个	1000 1000	141 141
3 3个	4 4个	2000 2000	313 313
4 4个	5 5个	2000 2000	313 313
5 5个	6 6个	2000 2000	313 313
6 6个	7 7	3000 3000	0 0
7 7	8 8个	3000 3000	0 0
8 8个	9 9	3000 3000	0 0
9 9	10 10	3000 3000	0 0

however if you have some criteria you can do:但是，如果您有一些标准，您可以这样做：

merged = df.merge(modes, on=['C_Id'], suffixes=['', '_y'])
merged['replacement'] = merged.EMI.gt(merged.EMI_y*10) # use your criteria of outlier here
merged.loc[merged.replacement,'EMI'] = merged.loc[merged.replacement,'EMI_y']
merged.drop(columns=['EMI_y', 'replacement'])

Which will still give the same output for your example usecase however its comparisons will be based on the criteria you set:对于您的示例用例，它仍然会给出相同的 output 但其比较将基于您设置的标准：

	Id ID	C_Id C_Id	EMI电磁干扰
0 0	1 1个	1000 1000	141 141
1 1个	2 2个	1000 1000	141 141
2 2个	3 3个	1000 1000	141 141
3 3个	4 4个	2000 2000	313 313
4 4个	5 5个	2000 2000	313 313
5 5个	6 6个	2000 2000	313 313
6 6个	7 7	3000 3000	0 0
7 7	8 8个	3000 3000	0 0
8 8个	9 9	3000 3000	0 0
9 9	10 10	3000 3000	0 0

将组中的异常值替换为 pandas 系列中组的平均值

问题描述

1 个解决方案

解决方案1
0 2022-06-19 10:40:16

将组中的异常值替换为 pandas 系列中组的平均值

问题描述

1 个解决方案

解决方案1 0 2022-06-19 10:40:16

解决方案1
0 2022-06-19 10:40:16