繁体   English   中英

从熊猫的时间序列数据中删除插值的最佳方法是什么?

[英]What is the best way to remove interpolation from a time series data in Pandas?

是否有更好的方法从熊猫数据帧中的时间序列数据中删除插值数据?

我有一个时间序列数据,其中缺少的值用插值填充,但是我想删除插值的数据,然后再次用np.nan值替换。

输入数据:

Index                   Column_one     Column_two     
2017:10:03 03:44:00     13.61504936     14.65000057
2017:10:03 03:45:00     13.61504936     14.65000057
2017:10:03 03:46:00     13.61504936     14.65000057
2017:10:03 03:47:00     13.61504936     np.nan
2017:10:03 03:48:00     13.60000038     np.nan
2017:10:03 03:49:00     np.nan          np.nan
2017:10:03 03:50:00     np.nan          np.nan
2017:10:03 03:51:00     np.nan          np.nan
2017:10:03 03:52:00     np.nan          14.80000019
2017:10:03 03:53:00     np.nan          14.80000019
2017:10:03 03:54:00     14.21253681     14.80000019
2017:10:03 03:55:00     14.24253273     14.80000019

所有缺少的值都用插值填充

data_interpolated = data.interpolate()

插值数据:

Index                   Column_one     Column_two     
2017:10:03 03:44:00     13.61504936     14.65000057
2017:10:03 03:45:00     13.61504936     14.65000057
2017:10:03 03:46:00     13.61504936     14.65000057
2017:10:03 03:47:00     13.61504936     14.67500051
2017:10:03 03:48:00     13.60000038     14.70000044
2017:10:03 03:49:00     13.70208979     14.72500038
2017:10:03 03:50:00     13.80417919     14.75000032
2017:10:03 03:51:00     13.9062686      14.77500025
2017:10:03 03:52:00     14.008358       14.80000019
2017:10:03 03:53:00     14.11044741     14.80000019
2017:10:03 03:54:00     14.21253681     14.80000019
2017:10:03 03:55:00     14.24253273     14.80000019

现在,我想删除插值并获取初始数据集。

所需输出:

Index                   Column_one     Column_two     
2017:10:03 03:44:00     13.61504936     14.65000057
2017:10:03 03:45:00     13.61504936     14.65000057
2017:10:03 03:46:00     13.61504936     14.65000057
2017:10:03 03:47:00     13.61504936     np.nan
2017:10:03 03:48:00     13.60000038     np.nan
2017:10:03 03:49:00     np.nan          np.nan
2017:10:03 03:50:00     np.nan          np.nan
2017:10:03 03:51:00     np.nan          np.nan
2017:10:03 03:52:00     np.nan          14.80000019
2017:10:03 03:53:00     np.nan          14.80000019
2017:10:03 03:54:00     14.21253681     14.80000019
2017:10:03 03:55:00     14.24253273     14.80000019

请让我知道在Pandas或Numpy中是否有实现此目的的好方法?

我可以给你这样的事情:

for i in xrange(df.__len__()):
    if i == 0:
        continue
    df.loc[i, ('lin_one')] = df.loc[i, ('one')] - df.loc[i - 1, ('one')]
    df.loc[i, ('lin_two')] = df.loc[i, ('two')] - df.loc[i - 1, ('two')]

for i in xrange(df.__len__()-1):
    if df.lin_one[i] - df.lin_one[i+1] != 0 and df.lin_one[i] - df.lin_one[i+1] < 0.003:
        df.loc[i,('one')] = np.nan
    if df.lin_two[i] - df.lin_two[i+1] != 0 and df.lin_two[i] - df.lin_two[i+1] < 0.003:
        df.loc[i,('two')] = np.nan

这将产生以下输出:

                  index        one   lin_one        two  lin_two
0   2017:10:03 03:44:00  13.615049  0.000000  14.650001    0.000
1   2017:10:03 03:45:00  13.615049  0.000000  14.650001    0.000
2   2017:10:03 03:46:00  13.615049  0.000000        NaN    0.000
3   2017:10:03 03:47:00  13.615049  0.000000        NaN    0.025
4   2017:10:03 03:48:00        NaN -0.015049        NaN    0.025
5   2017:10:03 03:49:00        NaN  0.102089        NaN    0.025
6   2017:10:03 03:50:00        NaN  0.102089        NaN    0.025
7   2017:10:03 03:51:00        NaN  0.102089        NaN    0.025
8   2017:10:03 03:52:00        NaN  0.102089  14.800000    0.025
9   2017:10:03 03:53:00        NaN  0.102089  14.800000    0.000
10  2017:10:03 03:54:00  14.212537  0.102089  14.800000    0.000
11  2017:10:03 03:55:00  14.242533  0.029996  14.800000    0.000

然后可以删除计算列lin_onelin_two

del df['lin_one']
del df['lin_two']

但是这种方法会杀死未插值数据的一个值...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM