简体   繁体   English

如果日期间隔大于N秒,则使用最后一个值填充列表

[英]Fill list with last value if date gap is greater than N seconds

Suppose I have the list data : 假设我有列表data

import numpy as np
import datetime

np.random.seed(0)
aux = [10,30,50,60,70,110,120]
base = datetime.datetime(2018, 1, 1, 22, 34, 20)
data = [[base + datetime.timedelta(seconds=s), 
         round(np.random.rand(),3)] for s in aux]

This returns: 返回:

data == 

[[datetime.datetime(2018, 1, 1, 22, 34, 30), 0.549],
 [datetime.datetime(2018, 1, 1, 22, 34, 50), 0.715],
 [datetime.datetime(2018, 1, 1, 22, 35, 10), 0.603],
 [datetime.datetime(2018, 1, 1, 22, 35, 20), 0.545],
 [datetime.datetime(2018, 1, 1, 22, 35, 30), 0.424],
 [datetime.datetime(2018, 1, 1, 22, 36, 10), 0.646],
 [datetime.datetime(2018, 1, 1, 22, 36, 20), 0.438]]

What I want to do is fill the spaces where the gaps in the dates are greater than10 seconds using the last previous value. 我想要做的是使用最后一个值填充日期中的间隙大于10秒的空间。 For this example, the output should be: 对于此示例,输出应为:

desired_output ==

[[datetime.datetime(2018, 1, 1, 22, 34, 30), 0.549],
 [datetime.datetime(2018, 1, 1, 22, 34, 40), 0.549],
 [datetime.datetime(2018, 1, 1, 22, 34, 50), 0.715],
 [datetime.datetime(2018, 1, 1, 22, 35), 0.715],
 [datetime.datetime(2018, 1, 1, 22, 35, 10), 0.603],
 [datetime.datetime(2018, 1, 1, 22, 35, 20), 0.545],
 [datetime.datetime(2018, 1, 1, 22, 35, 30), 0.424],
 [datetime.datetime(2018, 1, 1, 22, 35, 40), 0.424],
 [datetime.datetime(2018, 1, 1, 22, 35, 50), 0.424],
 [datetime.datetime(2018, 1, 1, 22, 36), 0.424],
 [datetime.datetime(2018, 1, 1, 22, 36, 10), 0.646],
 [datetime.datetime(2018, 1, 1, 22, 36, 20), 0.438]]

I can't think of any smart way to do this. 我想不出任何聪明的方法来做到这一点。 All dates are separated by multiples of 10 seconds. 所有日期之间的间隔均为10秒的倍数。 Any ideas? 有任何想法吗?

Option 1: with Pandas 选项1:与熊猫

If you're open to using Pandas , it makes reindexing operations like this easy: 如果您愿意使用Pandas ,它可以轻松地重建索引操作:

>>> import pandas as pd
>>> df = pd.DataFrame(data, columns=['date', 'value'])
>>> ridx = df.set_index('date').asfreq('10s').ffill().reset_index()
>>> ridx
                  date  value
0  2018-01-01 22:34:30  0.549
1  2018-01-01 22:34:40  0.549
2  2018-01-01 22:34:50  0.715
3  2018-01-01 22:35:00  0.715
4  2018-01-01 22:35:10  0.603
5  2018-01-01 22:35:20  0.545
6  2018-01-01 22:35:30  0.424
7  2018-01-01 22:35:40  0.424
8  2018-01-01 22:35:50  0.424
9  2018-01-01 22:36:00  0.424
10 2018-01-01 22:36:10  0.646
11 2018-01-01 22:36:20  0.438

.asfreq('10s') will fill the missing 10-second intervals. .asfreq('10s')将填补缺失的10秒间隔。 .ffill() means "forward-fill" missing values with the last-seen valid value. .ffill()表示使用最后看到的有效值“前向填充”缺失值。

To get back to the data structure that you have now (though note that the elements will be 2-tuples, rather then lists of length 2): 要回到你现在拥有的数据结构(虽然注意元素将是2元组,而不是长度为2的列表):

>>> native_ridx = list(zip(ridx['date'].dt.to_pydatetime().tolist(), ridx['value']))
>>> from pprint import pprint
>>> pprint(native_ridx[:5])
[(datetime.datetime(2018, 1, 1, 22, 34, 30), 0.549),
 (datetime.datetime(2018, 1, 1, 22, 34, 40), 0.549),
 (datetime.datetime(2018, 1, 1, 22, 34, 50), 0.715),
 (datetime.datetime(2018, 1, 1, 22, 35), 0.715),
 (datetime.datetime(2018, 1, 1, 22, 35, 10), 0.603)]

To confirm: 确认:

>>> assert all(tuple(i) == j for i, j in zip(desired_output, native_ridx))

Option 2: Native Python 选项2:原生Python

import datetime

def make_daterange(
    start: datetime.datetime,
    end: datetime.datetime,
    incr=datetime.timedelta(seconds=10)
):
    yield start
    while start < end:
        start += incr
        yield start

def reindex_ffill(data: list, incr=datetime.timedelta(seconds=10)):
    dates, _ = zip(*data)
    data = dict(data)
    start, end = min(dates), max(dates)
    daterng = make_daterange(start, end, incr)
    # If initial value is not valid, the element at [0][0] will be NaN
    lastvalid = np.nan
    get = data.get
    for date in daterng:
        value = get(date)
        if value:
            yield date, value
            lastvalid = value
        else:
            yield date, lastvalid

Example: 例:

>>> pynative_ridx = list(reindex_ffill(data))
>>> assert all(tuple(i) == j for i, j in zip(desired_output, pynative_ridx))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM