[英]How to impute the incoming missing value?
I have this following data:我有以下数据:
time_s rpm_motor_1 rpm_motor_2 vibration
0 0.00 7200.0 0.0 0.56
1 0.02 7469.3 0.0 0.58
2 0.04 7774.8 0.0 0.62
3 0.10 8181.8 0.0 0.63
4 0.12 7948.0 0.0 0.60
5 0.14 7982.9 0.0 0.60
6 0.16 7146.3 0.0 0.54
7 0.18 6693.4 0.0 0.48
8 0.20 6389.0 0.0 0.41
9 0.20 6389.0 0.0 0.41
10 0.22 7144.1 0.0 0.0
11 0.24 7251.4 0.0 0.49
12 0.26 7014.1 0.0 0.49
13 0.28 6500.4 0.0 0.40
14 0.30 6261.6 0.0 0.32
15 0.32 6236.0 0.0 0.0
16 0.34 6391.2 0.0 0.40
17 0.36 6953.2 0.0 0.54
18 0.38 7202.0 0.0 0.54
19 0.40 6582.6 0.0 0.40
20 0.42 6967.0 0.0 0.55
21 0.44 6941.0 0.0 0.53
22 0.46 6288.7 0.0 0.40
23 0.48 6219.8 0.0 0.37
24 0.50 6648.6 0.0 0.41
25 0.52 6846.4 0.0 0.46
26 0.54 6571.8 0.0 0.47
27 0.56 7171.3 0.0 0.58
28 0.58 6779.0 0.0 0.51
29 0.60 7021.8 0.0 0.48
30 0.62 6795.6 0.0 0.42
31 0.64 6358.8 0.0 0.40
32 0.66 6917.0 0.0 0.42
33 0.68 6944.0 0.0 0.50
34 0.70 7149.2 0.0 0.0
35 0.72 7381.6 0.0 0.53
36 0.74 7383.5 0.0 0.49
37 0.76 6120.1 0.0 0.37
38 0.78 6185.4 0.0 0.35
39 0.80 6481.2 0.0 0.38
40 0.82 6390.4 0.0 0.31
41 0.84 7136.9 0.0 0.51
42 0.86 6740.2 0.0 0.51
43 0.88 7179.3 0.0 0.58
44 0.90 6910.7 0.0 0.46
45 0.92 6978.7 0.0 0.47
46 0.94 6625.7 0.0 0.46
47 0.96 6515.2 0.0 0.39
48 0.98 6649.9 0.0 0.45
49 1.00 6638.1 0.0 0.47
Some of the vibration
values are 0.0.一些vibration
值为 0.0。
When rpm is plotted against vibration, this is what it looks like.当 rpm 与振动作图时,这就是它的样子。
There is a direct correlation between an increase in rpm and an increase in vibration.转速增加与振动增加之间存在直接相关性。 The values at the bottom of the chart by the x-axis are the 0.0 values you see in the data frame.图表底部 x 轴的值是您在数据框中看到的 0.0 值。
My approach is to iterate through the data, when coming across vibration[i] = 0.0
, use the data that came before it to make an informed guess.我的方法是遍历数据,当遇到vibration[i] = 0.0
时,使用它之前的数据做出明智的猜测。 I think a good way to impute this data would be to use KNN but I am not able to import sci-kit-learn我认为估算这些数据的一个好方法是使用 KNN,但我无法导入 sci-kit-learn
If you have a better approach at replacing the 0.0 values, I would love to hear it.如果您有更好的方法来替换 0.0 值,我很想听听。
You could use pandas' interpolate to get a linearly interpolated result:您可以使用pandas 的插值来获得线性插值结果:
df.replace({'vibration': {0.0: np.nan}}, inplace=True)
df.interpolate(inplace=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.