简体   繁体   English

Numpy和Pandas插值也会更改原始数据

[英]Numpy and Pandas interpolation also changes the original data

I am trying to interpolate data for some missing days. 我正在尝试对数据进行插值处理,以减少丢失的日子。 The orginal data is; 原始数据是;

2012-06-27 00:00:00 17
2012-06-27 01:00:00 17
2012-06-27 02:00:00 18
2012-06-27 03:00:00 18
2012-06-27 04:00:00 19
2012-06-27 05:00:00 20
2012-06-27 06:00:00 22
2012-06-27 07:00:00 23
2012-06-27 08:00:00 25
2012-06-27 09:00:00 27
2012-06-27 10:00:00 27
2012-06-27 11:00:00 29
2012-06-27 12:00:00 29
2012-06-27 13:00:00 30
2012-06-27 14:00:00 30
2012-06-27 15:00:00 29
2012-06-27 16:00:00 28
2012-06-27 17:00:00 26
2012-06-27 18:00:00 25
2012-06-27 19:00:00 24
2012-06-27 20:00:00 23
2012-06-27 21:00:00 23
2012-06-27 22:00:00 16
2012-06-27 23:00:00 15
2012-06-29 00:00:00 15
2012-06-29 01:00:00 16
2012-06-29 02:00:00 16
2012-06-29 03:00:00 16
2012-06-29 04:00:00 17
2012-06-29 05:00:00 17
2012-06-29 06:00:00 18
2012-06-29 07:00:00 19
2012-06-29 08:00:00 20
2012-06-29 09:00:00 22
2012-06-29 10:00:00 22
2012-06-29 11:00:00 22
2012-06-29 12:00:00 22
2012-06-29 13:00:00 22
2012-06-29 14:00:00 22
2012-06-29 15:00:00 22
2012-06-29 16:00:00 21
2012-06-29 17:00:00 19
2012-06-29 18:00:00 17
2012-06-29 19:00:00 16
2012-06-29 20:00:00 15
2012-06-29 21:00:00 14
2012-06-29 22:00:00 14
2012-06-29 23:00:00 13

As you can see 2014-12-28 is missing, so I tried to interpolate it using both Numpy and Pandas. 如您所见,缺少2014-12-28,因此我尝试使用Numpy和Pandas对其进行插值。 For Numpy the code is; 对于Numpy,代码为:

def inter_lin_nan(ts_temp, rule):
ts_temp = ts_temp.resample(rule)
mask = np.isnan(ts_temp)
# interpolling missing values
ts_temp[mask] = np.interp(np.flatnonzero(mask), np.flatnonzero(~mask),ts_temp[~mask])
return(ts_temp)

and with Pandas I used; 和我一起使用的熊猫

df_temp=df_temp.asfreq('1h')
df_temp['Temp2'] = df_temp['temp'].interpolate(method='linear')

The problem is, both of these method does interpolate for the missing day, but they also change original data for 2014-12-29. 问题在于,这两种方法均会在缺失的日期进行插值,但它们也会更改2014-12-29的原始数据。 Do you know why this is happening or am I missing something? 您知道为什么会这样吗,还是我错过了什么?

I cannot reproduce the problem, but this works for me (assuming your data frame is indexed on datetime): 我无法重现该问题,但这对我有用(假设您的数据帧是在日期时间索引的):

df_resampled = df.resample('1H').interpolate(method='linear')

Output: 输出:

在此处输入图片说明

As you can see, the lines overlap perfectly for the days where there is data: no original data is 'changed'. 如您所见,在有数据的日子里,两条线完全重叠:没有原始数据被“更改”。 The interpolation seems to make sense too, and in this plot the missing values in the original series were set to 0 to allow a comparison. 插值似乎也很有意义,在该图中,原始序列中的缺失值被设置为0以进行比较。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Python 和 Pandas 中,我有一个 function 来更改 ZBA834BA059A9A3794E459C1 的索引。 但是,它也改变了原来 DataFrame 的索引 - In Python with Pandas, I have a function to change the index of DataFrame. But, it also changes the index of the original DataFrame 使用Pandas和numpy插值数据帧时数据发生变化 - Data changes while interpolating data frame using Pandas and numpy Numpy按原始索引取消过滤数据 - Numpy unfilter the data by original indexes 更新从 Pandas DatFrame 列派生的 numpy 数组怎么可能也(意外地)更新数据框列? - How is it possible that updating a numpy array derived from a Pandas DatFrame column also (unexpectedly) updates the data frame column? Python Numpy或Pandas线性插值用于与Datetime相关的值 - Python Numpy or Pandas Linear Interpolation For Datetime related Values 熊猫数据框初始化会更改原始数据框的值 - Pandas dataframe initialization changes the value of the original dataframe 熊猫数据框:使用线性插值重新采样 - Pandas data frame: resample with linear interpolation Python pandas时间序列插值日期时间数据 - Python pandas time series interpolation datetime data 一个视频的NumPy数组写入同一个视频后从原来的变化 - NumPy array of a video changes from the original after writing into the same video Numpy/Scipy 中大气数据的快速 3D 插值 - Fast 3D interpolation of atmospheric data in Numpy/Scipy
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM