Is there a better way to remove interpolated data from time series data in pandas data frame ?
I have a time series data in which missing values are filled with interpolation but I would like to remove interpolated data and replace then with np.nan values again.
Input Data:
Index Column_one Column_two
2017:10:03 03:44:00 13.61504936 14.65000057
2017:10:03 03:45:00 13.61504936 14.65000057
2017:10:03 03:46:00 13.61504936 14.65000057
2017:10:03 03:47:00 13.61504936 np.nan
2017:10:03 03:48:00 13.60000038 np.nan
2017:10:03 03:49:00 np.nan np.nan
2017:10:03 03:50:00 np.nan np.nan
2017:10:03 03:51:00 np.nan np.nan
2017:10:03 03:52:00 np.nan 14.80000019
2017:10:03 03:53:00 np.nan 14.80000019
2017:10:03 03:54:00 14.21253681 14.80000019
2017:10:03 03:55:00 14.24253273 14.80000019
All the missing values are filled with interpolation
data_interpolated = data.interpolate()
Interpolated Data:
Index Column_one Column_two
2017:10:03 03:44:00 13.61504936 14.65000057
2017:10:03 03:45:00 13.61504936 14.65000057
2017:10:03 03:46:00 13.61504936 14.65000057
2017:10:03 03:47:00 13.61504936 14.67500051
2017:10:03 03:48:00 13.60000038 14.70000044
2017:10:03 03:49:00 13.70208979 14.72500038
2017:10:03 03:50:00 13.80417919 14.75000032
2017:10:03 03:51:00 13.9062686 14.77500025
2017:10:03 03:52:00 14.008358 14.80000019
2017:10:03 03:53:00 14.11044741 14.80000019
2017:10:03 03:54:00 14.21253681 14.80000019
2017:10:03 03:55:00 14.24253273 14.80000019
Now I would like to remove the interpolated values and get the initial data set.
Desired Output:
Index Column_one Column_two
2017:10:03 03:44:00 13.61504936 14.65000057
2017:10:03 03:45:00 13.61504936 14.65000057
2017:10:03 03:46:00 13.61504936 14.65000057
2017:10:03 03:47:00 13.61504936 np.nan
2017:10:03 03:48:00 13.60000038 np.nan
2017:10:03 03:49:00 np.nan np.nan
2017:10:03 03:50:00 np.nan np.nan
2017:10:03 03:51:00 np.nan np.nan
2017:10:03 03:52:00 np.nan 14.80000019
2017:10:03 03:53:00 np.nan 14.80000019
2017:10:03 03:54:00 14.21253681 14.80000019
2017:10:03 03:55:00 14.24253273 14.80000019
Please let me know if there is any good way to implement this in Pandas or Numpy ?
I can raise you something like this:
for i in xrange(df.__len__()):
if i == 0:
continue
df.loc[i, ('lin_one')] = df.loc[i, ('one')] - df.loc[i - 1, ('one')]
df.loc[i, ('lin_two')] = df.loc[i, ('two')] - df.loc[i - 1, ('two')]
for i in xrange(df.__len__()-1):
if df.lin_one[i] - df.lin_one[i+1] != 0 and df.lin_one[i] - df.lin_one[i+1] < 0.003:
df.loc[i,('one')] = np.nan
if df.lin_two[i] - df.lin_two[i+1] != 0 and df.lin_two[i] - df.lin_two[i+1] < 0.003:
df.loc[i,('two')] = np.nan
This will produce the following output:
index one lin_one two lin_two
0 2017:10:03 03:44:00 13.615049 0.000000 14.650001 0.000
1 2017:10:03 03:45:00 13.615049 0.000000 14.650001 0.000
2 2017:10:03 03:46:00 13.615049 0.000000 NaN 0.000
3 2017:10:03 03:47:00 13.615049 0.000000 NaN 0.025
4 2017:10:03 03:48:00 NaN -0.015049 NaN 0.025
5 2017:10:03 03:49:00 NaN 0.102089 NaN 0.025
6 2017:10:03 03:50:00 NaN 0.102089 NaN 0.025
7 2017:10:03 03:51:00 NaN 0.102089 NaN 0.025
8 2017:10:03 03:52:00 NaN 0.102089 14.800000 0.025
9 2017:10:03 03:53:00 NaN 0.102089 14.800000 0.000
10 2017:10:03 03:54:00 14.212537 0.102089 14.800000 0.000
11 2017:10:03 03:55:00 14.242533 0.029996 14.800000 0.000
then you can delete the calculating columns lin_one
and lin_two
:
del df['lin_one']
del df['lin_two']
But this method kills one value of the not interpolated data...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.