I'm looking to convert a data frame of the following format as an example:
>>>df
vals
2019-08-10 12:03:05 1.0
2019-08-10 12:03:06 NaN
2019-08-10 12:03:07 NaN
2019-08-10 12:03:08 3.0
2019-08-10 12:03:09 4.0
2019-08-10 12:03:10 NaN
2019-08-10 12:03:11 NaN
2019-08-10 12:03:12 5.0
2019-08-10 12:03:13 NaN
2019-08-10 12:03:14 1.0
2019-08-10 12:03:15 NaN
2019-08-10 12:03:16 NaN
2019-08-10 12:03:17 6.0
into one such as:
>>>df
vals
2019-08-10 12:03:05 1.0
2019-08-10 12:03:06 1.667
2019-08-10 12:03:07 2.333
2019-08-10 12:03:08 3.0
2019-08-10 12:03:09 3.667
2019-08-10 12:03:10 4.333
2019-08-10 12:03:11 5.0
2019-08-10 12:03:12 3.667
2019-08-10 12:03:13 2.333
2019-08-10 12:03:14 1.0
2019-08-10 12:03:15 2.667
2019-08-10 12:03:16 4.333
2019-08-10 12:03:17 6.0
Where the dataframe was first aligned to look like the following (taking the closest value to every 3rd value):
>>>df
vals
2019-08-10 12:03:05 1.0
2019-08-10 12:03:06 NaN
2019-08-10 12:03:07 NaN
2019-08-10 12:03:08 3.0
2019-08-10 12:03:09 NaN
2019-08-10 12:03:10 NaN
2019-08-10 12:03:11 5.0
2019-08-10 12:03:12 NaN
2019-08-10 12:03:13 NaN
2019-08-10 12:03:14 1.0
2019-08-10 12:03:15 NaN
2019-08-10 12:03:16 NaN
2019-08-10 12:03:17 6.0
And then linearly interpolated between each value to produce the final dataframe. Should there be a gap of more than 2 seconds, I'd like to just not interpolate between those 2 values.
This is what I've tried so far:
df.resample('3s').nearest()
Which produces:
>>> df.resample('3s').nearest()
vals
2019-08-10 12:03:03 1.0
2019-08-10 12:03:06 NaN
2019-08-10 12:03:09 4.0
2019-08-10 12:03:12 5.0
2019-08-10 12:03:15 NaN
Also:
>>> df.resample('2s').nearest()
vals
2019-08-10 12:03:04 1.0
2019-08-10 12:03:06 NaN
2019-08-10 12:03:08 3.0
2019-08-10 12:03:10 NaN
2019-08-10 12:03:12 5.0
2019-08-10 12:03:14 1.0
2019-08-10 12:03:16 NaN
Which makes it very clear that nearest is a complete lie, or at least a misnomer, because the nearest value to 10 is quite obviously 4. Also, the final value at 2019-08-10 12:03:16
should definitely be 6.0
.
This is just trying to align the values to the second, after this, simply interpolate
seems to work.
Any help is appreciated.
I think you need base
parameter for change offset of sampling period with modulo by 3
of first value of index (because 3 seconds) with Resampler.first
:
df['new'] = df.resample('3s', base=df.index[0].second % 3).first()
print (df)
vals new
2019-08-10 12:03:05 1.0 1.0
2019-08-10 12:03:06 NaN NaN
2019-08-10 12:03:07 NaN NaN
2019-08-10 12:03:08 3.0 3.0
2019-08-10 12:03:09 4.0 NaN
2019-08-10 12:03:10 NaN NaN
2019-08-10 12:03:11 NaN 5.0
2019-08-10 12:03:12 5.0 NaN
2019-08-10 12:03:13 NaN NaN
2019-08-10 12:03:14 1.0 1.0
2019-08-10 12:03:15 NaN NaN
2019-08-10 12:03:16 NaN NaN
2019-08-10 12:03:17 6.0 6.0
Then iterpolate:
df['new'] = df['new'].interpolate()
print (df)
vals new
2019-08-10 12:03:05 1.0 1.000000
2019-08-10 12:03:06 NaN 1.666667
2019-08-10 12:03:07 NaN 2.333333
2019-08-10 12:03:08 3.0 3.000000
2019-08-10 12:03:09 4.0 3.666667
2019-08-10 12:03:10 NaN 4.333333
2019-08-10 12:03:11 NaN 5.000000
2019-08-10 12:03:12 5.0 3.666667
2019-08-10 12:03:13 NaN 2.333333
2019-08-10 12:03:14 1.0 1.000000
2019-08-10 12:03:15 NaN 2.666667
2019-08-10 12:03:16 NaN 4.333333
2019-08-10 12:03:17 6.0 6.000000
Testing with add 2 seconds to index:
df.index += pd.Timedelta(2, 's')
df['new'] = df.resample('3s', base=df.index[0].second % 3).first()
print (df)
vals new
2019-08-10 12:03:07 1.0 1.0
2019-08-10 12:03:08 NaN NaN
2019-08-10 12:03:09 NaN NaN
2019-08-10 12:03:10 3.0 3.0
2019-08-10 12:03:11 4.0 NaN
2019-08-10 12:03:12 NaN NaN
2019-08-10 12:03:13 NaN 5.0
2019-08-10 12:03:14 5.0 NaN
2019-08-10 12:03:15 NaN NaN
2019-08-10 12:03:16 1.0 1.0
2019-08-10 12:03:17 NaN NaN
2019-08-10 12:03:18 NaN NaN
2019-08-10 12:03:19 6.0 6.0
df1=df.set_index(['Time']).interpolate(method='linear').reset_index()
print(df1)
Output
Time vals
0 2019-08-10 12:03:05 1.000000
1 2019-08-10 12:03:06 1.666667
2 2019-08-10 12:03:07 2.333333
3 2019-08-10 12:03:08 3.000000
4 2019-08-10 12:03:09 4.000000
5 2019-08-10 12:03:10 4.333333
6 2019-08-10 12:03:11 4.666667
7 2019-08-10 12:03:12 5.000000
8 2019-08-10 12:03:13 3.000000
9 2019-08-10 12:03:14 1.000000
10 2019-08-10 12:03:15 2.666667
11 2019-08-10 12:03:16 4.333333
12 2019-08-10 12:03:17 6.000000
如果要用最接近的值替换nan值,则可以使用插值
data['value'] = data['value'].interpolate(method='nearest')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.