在Pandas时间序列中，如何在延迟到期前为每一行获取最后一个值？

Question

我在Pandas中有一个单值A的时间序列。 我想生成第二列B，其包含在某个延迟（相对于原始行的时间）到期之前的最后一个值。 行没有恒定的时间差。 有没有办法在Pandas（或Numpy）中有效地实现这一点？ 数据框可能包含数百万行，我希望此操作最多需要几秒钟。

这是一个例子：

time  A
10:00 10
11:00 20
11:05 30
11:15 20

让延迟时间为10分钟。 那么结果应该是：

time  A  B
10:00 10 10    # In 10 minutes the value is still the same
11:00 20 30    # In 5 < 10 minutes, the value will have changed 
11:05 30 30    # Exactly, not less than 10 minutes
11:15 20 20    # Last row contains the same value

编辑：如果没有快速的Pandas / Numpy解决方案，我将只在Numba中编码。 但是，出于某种原因，我过去的Numba解决方案类似的问题（nopython和嵌套for＆break）相当缓慢，这就是为什么我要求更好的方法。

Answer 1

这是一种方法。 关键是searchsorted函数，它找到延迟时间值的插入索引：

import numpy as np
import pandas as pd

df = pd.DataFrame({'time': ['10:00', '11:00', '11:05', '11:15'],
                   'A': [10, 20, 30, 20]})
df['time'] = pd.to_timedelta(df['time'] + ':00')
t2 = df['time'] + pd.to_timedelta('10min')
idx = df['time'].searchsorted(t2)
df['B'] = df.iloc[idx - 1]['A'].values
print(df)
#       time   A   B
# 0 10:00:00  10  10
# 1 11:00:00  20  30
# 2 11:05:00  30  30
# 3 11:15:00  20  20

在Pandas时间序列中，如何在延迟到期前为每一行获取最后一个值？

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-08-06 13:52:55

在Pandas时间序列中，如何在延迟到期前为每一行获取最后一个值？

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-08-06 13:52:55

解决方案1
3 已采纳 2019-08-06 13:52:55