[英]Efficient way to to get evenly-spaced data / pandas DataFrame.reindex
In order to be able to compare different data sets I need a way to put these on a common time basis.为了能够比较不同的数据集,我需要一种方法将它们放在一个共同的时间基础上。 What is the most efficient way to achieve this?
实现这一目标的最有效方法是什么?
I've tried a few ways and the most easy should - to my understanding - be with pandas DataFrame.reindex:我尝试了几种方法,据我所知,最简单的方法应该是使用 pandas DataFrame.reindex:
I have an unevenly spaced time array with associated values for the new status (on/off) which persists after the entry.我有一个间隔不均匀的时间数组,其中包含新状态(开/关)的相关值,该值在输入后仍然存在。 As such I want to use the previous value of the status column until a new value at a new time for the status is set.
因此,我想使用状态列的先前值,直到为状态设置新时间的新值。
The typical array looks like, df
is a one-column DataFrame with time as index and status as column:典型的数组看起来像,
df
是一个单列 DataFrame,时间作为索引,状态作为列:
In [58]: df
Out[58]:
status
time
1632160022 0
1632986376 <NA>
1632986496 0
1633448715 1
1633452437 0
1633454358 1
1633461201 0
1633534763 1
1633551686 0
...
From the docs of pandas DataFrame.reindex I read that rebasing / re-indexing with the fill-method pad / ffill
should yield the previous value:从 pandas DataFrame.reindex的文档中,我读到使用填充方法
pad / ffill
重新定位 / 重新索引应该产生以前的值:
# creating evenly-spaced time base for observation duration
tmin = min(df.index)
tmax = max(df.index)
tspacing = 120
tbase = [t for t in range(tmin,tmax,tspacing)]
# create the temporally evenly-spaced DataFrame
ndf = df.reindex(index=tbase, method='pad', tolerance=120)
However the result is different to what I expect, all subsequent status
entries get assigned NaN
instead of the forward interpolated value:但是结果与我的预期不同,所有后续
status
条目都被分配了NaN
而不是前向插值:
In[62]: ndf
Out[62]:
status
time
1632160022 0
1632160142 0
1632160262 NaN
1632160382 NaN
1632160502 NaN
...
Any idea what I'm missing, doing wrong or if this method does not do the trick: is there another ready-made method available?知道我遗漏了什么,做错了什么,或者如果这种方法不起作用:是否有另一种现成的方法可用?
As such I want to use the previous value of the status column until a new value at a new time for the status is set.
因此,我想使用状态列的先前值,直到为状态设置新时间的新值。
IIUC:国际大学联盟:
ndf = df.reindex(tbase, method='ffill')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.