[英]find the last number in list with consecutive number of numbers greater than “n”
[英]Fill list with last value if date gap is greater than N seconds
假设我有列表data
:
import numpy as np
import datetime
np.random.seed(0)
aux = [10,30,50,60,70,110,120]
base = datetime.datetime(2018, 1, 1, 22, 34, 20)
data = [[base + datetime.timedelta(seconds=s),
round(np.random.rand(),3)] for s in aux]
返回:
data ==
[[datetime.datetime(2018, 1, 1, 22, 34, 30), 0.549],
[datetime.datetime(2018, 1, 1, 22, 34, 50), 0.715],
[datetime.datetime(2018, 1, 1, 22, 35, 10), 0.603],
[datetime.datetime(2018, 1, 1, 22, 35, 20), 0.545],
[datetime.datetime(2018, 1, 1, 22, 35, 30), 0.424],
[datetime.datetime(2018, 1, 1, 22, 36, 10), 0.646],
[datetime.datetime(2018, 1, 1, 22, 36, 20), 0.438]]
我想要做的是使用最后一个值填充日期中的间隙大于10秒的空间。 对于此示例,输出应为:
desired_output ==
[[datetime.datetime(2018, 1, 1, 22, 34, 30), 0.549],
[datetime.datetime(2018, 1, 1, 22, 34, 40), 0.549],
[datetime.datetime(2018, 1, 1, 22, 34, 50), 0.715],
[datetime.datetime(2018, 1, 1, 22, 35), 0.715],
[datetime.datetime(2018, 1, 1, 22, 35, 10), 0.603],
[datetime.datetime(2018, 1, 1, 22, 35, 20), 0.545],
[datetime.datetime(2018, 1, 1, 22, 35, 30), 0.424],
[datetime.datetime(2018, 1, 1, 22, 35, 40), 0.424],
[datetime.datetime(2018, 1, 1, 22, 35, 50), 0.424],
[datetime.datetime(2018, 1, 1, 22, 36), 0.424],
[datetime.datetime(2018, 1, 1, 22, 36, 10), 0.646],
[datetime.datetime(2018, 1, 1, 22, 36, 20), 0.438]]
我想不出任何聪明的方法来做到这一点。 所有日期之间的间隔均为10秒的倍数。 有任何想法吗?
如果您愿意使用Pandas ,它可以轻松地重建索引操作:
>>> import pandas as pd
>>> df = pd.DataFrame(data, columns=['date', 'value'])
>>> ridx = df.set_index('date').asfreq('10s').ffill().reset_index()
>>> ridx
date value
0 2018-01-01 22:34:30 0.549
1 2018-01-01 22:34:40 0.549
2 2018-01-01 22:34:50 0.715
3 2018-01-01 22:35:00 0.715
4 2018-01-01 22:35:10 0.603
5 2018-01-01 22:35:20 0.545
6 2018-01-01 22:35:30 0.424
7 2018-01-01 22:35:40 0.424
8 2018-01-01 22:35:50 0.424
9 2018-01-01 22:36:00 0.424
10 2018-01-01 22:36:10 0.646
11 2018-01-01 22:36:20 0.438
.asfreq('10s')
将填补缺失的10秒间隔。 .ffill()
表示使用最后看到的有效值“前向填充”缺失值。
要回到你现在拥有的数据结构(虽然注意元素将是2元组,而不是长度为2的列表):
>>> native_ridx = list(zip(ridx['date'].dt.to_pydatetime().tolist(), ridx['value']))
>>> from pprint import pprint
>>> pprint(native_ridx[:5])
[(datetime.datetime(2018, 1, 1, 22, 34, 30), 0.549),
(datetime.datetime(2018, 1, 1, 22, 34, 40), 0.549),
(datetime.datetime(2018, 1, 1, 22, 34, 50), 0.715),
(datetime.datetime(2018, 1, 1, 22, 35), 0.715),
(datetime.datetime(2018, 1, 1, 22, 35, 10), 0.603)]
确认:
>>> assert all(tuple(i) == j for i, j in zip(desired_output, native_ridx))
import datetime
def make_daterange(
start: datetime.datetime,
end: datetime.datetime,
incr=datetime.timedelta(seconds=10)
):
yield start
while start < end:
start += incr
yield start
def reindex_ffill(data: list, incr=datetime.timedelta(seconds=10)):
dates, _ = zip(*data)
data = dict(data)
start, end = min(dates), max(dates)
daterng = make_daterange(start, end, incr)
# If initial value is not valid, the element at [0][0] will be NaN
lastvalid = np.nan
get = data.get
for date in daterng:
value = get(date)
if value:
yield date, value
lastvalid = value
else:
yield date, lastvalid
例:
>>> pynative_ridx = list(reindex_ffill(data))
>>> assert all(tuple(i) == j for i, j in zip(desired_output, pynative_ridx))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.