Numpy和Pandas插值也会更改原始数据

Question

I am trying to interpolate data for some missing days. 我正在尝试对数据进行插值处理，以减少丢失的日子。 The orginal data is; 原始数据是；

2012-06-27 00:00:00 17
2012-06-27 01:00:00 17
2012-06-27 02:00:00 18
2012-06-27 03:00:00 18
2012-06-27 04:00:00 19
2012-06-27 05:00:00 20
2012-06-27 06:00:00 22
2012-06-27 07:00:00 23
2012-06-27 08:00:00 25
2012-06-27 09:00:00 27
2012-06-27 10:00:00 27
2012-06-27 11:00:00 29
2012-06-27 12:00:00 29
2012-06-27 13:00:00 30
2012-06-27 14:00:00 30
2012-06-27 15:00:00 29
2012-06-27 16:00:00 28
2012-06-27 17:00:00 26
2012-06-27 18:00:00 25
2012-06-27 19:00:00 24
2012-06-27 20:00:00 23
2012-06-27 21:00:00 23
2012-06-27 22:00:00 16
2012-06-27 23:00:00 15
2012-06-29 00:00:00 15
2012-06-29 01:00:00 16
2012-06-29 02:00:00 16
2012-06-29 03:00:00 16
2012-06-29 04:00:00 17
2012-06-29 05:00:00 17
2012-06-29 06:00:00 18
2012-06-29 07:00:00 19
2012-06-29 08:00:00 20
2012-06-29 09:00:00 22
2012-06-29 10:00:00 22
2012-06-29 11:00:00 22
2012-06-29 12:00:00 22
2012-06-29 13:00:00 22
2012-06-29 14:00:00 22
2012-06-29 15:00:00 22
2012-06-29 16:00:00 21
2012-06-29 17:00:00 19
2012-06-29 18:00:00 17
2012-06-29 19:00:00 16
2012-06-29 20:00:00 15
2012-06-29 21:00:00 14
2012-06-29 22:00:00 14
2012-06-29 23:00:00 13

As you can see 2014-12-28 is missing, so I tried to interpolate it using both Numpy and Pandas. 如您所见，缺少2014-12-28，因此我尝试使用Numpy和Pandas对其进行插值。 For Numpy the code is; 对于Numpy，代码为：

def inter_lin_nan(ts_temp, rule):
ts_temp = ts_temp.resample(rule)
mask = np.isnan(ts_temp)
# interpolling missing values
ts_temp[mask] = np.interp(np.flatnonzero(mask), np.flatnonzero(~mask),ts_temp[~mask])
return(ts_temp)

and with Pandas I used; 和我一起使用的熊猫

df_temp=df_temp.asfreq('1h')
df_temp['Temp2'] = df_temp['temp'].interpolate(method='linear')

The problem is, both of these method does interpolate for the missing day, but they also change original data for 2014-12-29. 问题在于，这两种方法均会在缺失的日期进行插值，但它们也会更改2014-12-29的原始数据。 Do you know why this is happening or am I missing something? 您知道为什么会这样吗，还是我错过了什么？

Answer 1

I cannot reproduce the problem, but this works for me (assuming your data frame is indexed on datetime): 我无法重现该问题，但这对我有用（假设您的数据帧是在日期时间索引的）：

df_resampled = df.resample('1H').interpolate(method='linear')

Output: 输出：

As you can see, the lines overlap perfectly for the days where there is data: no original data is 'changed'. 如您所见，在有数据的日子里，两条线完全重叠：没有原始数据被“更改”。 The interpolation seems to make sense too, and in this plot the missing values in the original series were set to 0 to allow a comparison. 插值似乎也很有意义，在该图中，原始序列中的缺失值被设置为0以进行比较。

Numpy和Pandas插值也会更改原始数据

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-12-22 17:16:54

Numpy和Pandas插值也会更改原始数据

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-12-22 17:16:54

解决方案1
0 已采纳 2015-12-22 17:16:54