简体   繁体   English

获取均匀间隔数据的有效方法/pandas DataFrame.reindex

[英]Efficient way to to get evenly-spaced data / pandas DataFrame.reindex

In order to be able to compare different data sets I need a way to put these on a common time basis.为了能够比较不同的数据集,我需要一种方法将它们放在一个共同的时间基础上。 What is the most efficient way to achieve this?实现这一目标的最有效方法是什么?

I've tried a few ways and the most easy should - to my understanding - be with pandas DataFrame.reindex:我尝试了几种方法,据我所知,最简单的方法应该是使用 pandas DataFrame.reindex:

I have an unevenly spaced time array with associated values for the new status (on/off) which persists after the entry.我有一个间隔不均匀的时间数组,其中包含新状态(开/关)的相关值,该值在输入后仍然存在。 As such I want to use the previous value of the status column until a new value at a new time for the status is set.因此,我想使用状态列的先前值,直到为状态设置新时间的新值。

The typical array looks like, df is a one-column DataFrame with time as index and status as column:典型的数组看起来像, df是一个单列 DataFrame,时间作为索引,状态作为列:

In [58]: df
Out[58]: 
           status
time             
1632160022      0
1632986376   <NA>
1632986496      0
1633448715      1
1633452437      0
1633454358      1
1633461201      0
1633534763      1
1633551686      0 
...

From the docs of pandas DataFrame.reindex I read that rebasing / re-indexing with the fill-method pad / ffill should yield the previous value:从 pandas DataFrame.reindex的文档中,我读到使用填充方法pad / ffill重新定位 / 重新索引应该产生以前的值:

# creating evenly-spaced time base for observation duration
tmin = min(df.index)
tmax = max(df.index)
tspacing = 120
tbase = [t for t in range(tmin,tmax,tspacing)]

# create the temporally evenly-spaced DataFrame
ndf = df.reindex(index=tbase, method='pad', tolerance=120)

However the result is different to what I expect, all subsequent status entries get assigned NaN instead of the forward interpolated value:但是结果与我的预期不同,所有后续status条目都被分配了NaN而不是前向插值:

In[62]: ndf
Out[62]: 
           status
time             
1632160022      0
1632160142      0
1632160262    NaN
1632160382    NaN
1632160502    NaN
          ...

Any idea what I'm missing, doing wrong or if this method does not do the trick: is there another ready-made method available?知道我遗漏了什么,做错了什么,或者如果这种方法不起作用:是否有另一种现成的方法可用?

As such I want to use the previous value of the status column until a new value at a new time for the status is set.因此,我想使用状态列的先前值,直到为状态设置新时间的新值。

IIUC:国际大学联盟:

ndf = df.reindex(tbase, method='ffill')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 Pandas 数据帧为 pytorch lstm 准备数据的最有效方法 - Most efficient way to prep data for pytorch lstm from pandas dataframe 如何通过从另一个更大的数据框中选择一些数据列表来有效地构建熊猫数据框(或字典)? - How to build a pandas dataframe (or dict) in an efficient way by selecting some lists of data from another bigger dataframe? 在 2 中拆分 pandas DataFrame 列的有效方法是什么 - What is the efficient way of splitting a pandas DataFrame column in 2 部分重新索引行索引 Pandas DataFrame - partly reindex row indexes Pandas DataFrame 以有效的方式将数据列表替换为 DataFrame - Replacing list of data to DataFrame in efficient way 在pandas中获取组名的有效方法 - Efficient way to get group names in pandas 将pandas dataframe列拆分为多个列的最有效方法 - Most efficient way to split a pandas dataframe column into several columns 在 pandas dataframe 中验证多个 email 主机名的最有效方法是什么 - What is the most efficient way to verify multiple email hostnames in a pandas dataframe 哪种是放宽熊猫数据帧的最有效方法? - Which is the most efficient way of flattening down a pandas dataframe? 从同一 dataframe 中查找 Pandas 中的值的有效方法 - Efficient way to lookup a value in Pandas from the same dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM