简体   繁体   English

滑动时间 window 与 python 双端队列

[英]Sliding time window with python deque

I have deque.我有双端队列。 Each element of the deque consists of time and event field.双端队列的每个元素都由时间事件字段组成。 So, this is similar to list of dicts.所以,这类似于字典列表。 Data is always sorted by time from oldest to newest.数据始终按时间从旧到新排序。 First element of the deque is the oldest.双端队列的第一个元素是最旧的。

Please, note that deque is infinite and every time new element(s) are added with unknown time.请注意,双端队列是无限的,每次添加新元素时都不知道时间。 This means that new element can be added after 1 minute or after 1 hour.这意味着可以在 1 分钟或 1 小时后添加新元素。 Who knows...谁知道...

data = [
        {
            "time": "07:14:40",
            "event": 24
        },
        {
            "time": "07:15:40",
            "event": 394
        },
        {
            "time": "07:16:40",
            "event": 384
        },
        {
            "time": "07:17:40",
            "event": 394
        },
        {
            "time": "07:18:40",
            "event": 384
        },
        {
            "time": "07:19:40",
            "event": 2
        },
        {
            "time": "07:20:40",
            "event": 24
        },
        {
            "time": "07:21:40",
            "event": 72
        },
        {
            "time": "07:22:40",
            "event": 24
        },
        {
            "time": "07:23:40",
            "event": 72
        },
        {
            "time": "07:24:40",
            "event": 99
        }
    ]

I'm also given window size.我还得到了 window 尺寸。 Let it be 5 minutes.让它是5分钟。

I want to iterate over this deque with the given window size and calculate expanding moving sum .我想用给定的 window 大小迭代这个双端队列并计算扩展移动总和 Let me elaborate what does this mean.让我详细说明这是什么意思。

During iteration over this deque, during every iteration, I have to check the current AND older elements if they are inside 5 minute window and sum them up.在这个双端队列的迭代过程中,在每次迭代过程中,我必须检查当前和较旧的元素是否在 5 分钟 window 之内,并将它们相加。 If older element(s) are outside of 5 minute window then pop them from deque.如果旧元素在 5 分钟 window 之外,则将它们从双端队列中弹出。

In other words, during first iteration start date will be换句话说,在第一次迭代期间,开始日期将是

07:09:40 - (going 5 minute back)

and end date will be结束日期将是

07:14:40

and sum will be 24. During second iteration, as this element is not inside the date range then I have to redefine my date range in the following way:总和将为 24。在第二次迭代期间,由于此元素不在日期范围内,因此我必须按以下方式重新定义我的日期范围:

start date will be开始日期将是

07:10:40

and end date will be结束日期将是

07:15:40

Now, I have to look back and check all previous elements.现在,我必须回顾并检查所有以前的元素。 The date of the first element is第一个元素的日期是

07:14:40

which is inside my new date range and I will do new summation (24 + 394)这是在我的新日期范围内,我将进行新的求和 (24 + 394)

During third iteration, the time field is outside my previous date range and then I have to redefine my date range in the same manner as I did during previous iteration and do all the summation similarly.在第三次迭代期间,时间字段超出了我之前的日期范围,然后我必须以与之前迭代相同的方式重新定义我的日期范围,并以类似的方式进行所有求和。

When I reach the following element (7th iteration)当我到达以下元素时(第 7 次迭代)

"time": "07:20:40",
"event": 24

My date range will be:我的日期范围是:

start date:开始日期:

07:15:40

end date:结束日期:

07:20:40

Then I have to look back and grab all the elements which time field is inside this date range.然后我必须回顾并获取时间字段在此日期范围内的所有元素。 Note that the first element is outside the date range and I have to pop out this first element from the deque.请注意,第一个元素在日期范围之外,我必须从双端队列中弹出第一个元素。 - This is my question. - 这是我的问题。 How can I do this?我怎样才能做到这一点?

This is the code fragment I did but it does not work.这是我做的代码片段,但它不起作用。

from collections import deque, defaultdict
window_size = 300

test = deque(sort_data(list(read_json("final_real_test.json").values())[0]))


result = defaultdict(list)
final_input = deque()

end_date = test[0]["time"]
start_date = end_date - datetime.timedelta(seconds=window_size)


while test:
    record = test.popleft()

    if start_date <= record["time"] <= end_date:
        # Calculate the sum
        final_input.append(record)
    else:
        end_date = record["time"]
        start_date = end_date - datetime.timedelta(seconds=window_size)

        print("Returning back to the queue...")
        test.appendleft(record)
        print("Done")

You did not explain how your deque was updated, and how it should affect the window processing.你没有解释你的deque是如何更新的,以及它应该如何影响 window 处理。

But here is a proof-of-concept of the algorithm:但这里是算法的概念验证:

from datetime import datetime
from typing import Generator, List, Dict, Union

Element = Dict[str, Union[str, int]]
Series = List[Element]

def sliding_window(series: Series, window_duration: int) -> Generator[Series, None, None]:
    time_format = "%H:%M:%S"
    if len(series) > 0:
        for i_ending_item, ending_item in enumerate(series):
            end_window_time = datetime.strptime(ending_item["time"], time_format)
            print(f"window ends at item n°{i_ending_item} ({end_window_time!r})")
            window = [ending_item]
            for window_candidate_item in reversed(series[0:max(i_ending_item, 0)]):
                candidate_time = datetime.strptime(window_candidate_item["time"], time_format)
                assert end_window_time > candidate_time
                candidate_delta = end_window_time - candidate_time
                print(f"  {candidate_time=!r} {candidate_delta=!r} {candidate_delta.seconds=!r}")
                if candidate_delta.seconds < window_duration:  # non inclusive
                    print("    added to the window")
                    window.insert(0, window_candidate_item)
                else:
                    print("  stop there")
                    break
            else:
                print("  reached the beginning of the series")
            yield window

DATA: Series = [
    {"time": "07:14:40", "event": 24},
    {"time": "07:15:40", "event": 394},
    {"time": "07:16:40", "event": 384},
    {"time": "07:17:40", "event": 394},
    {"time": "07:18:40", "event": 384},
    {"time": "07:19:40", "event": 2},
    {"time": "07:20:40", "event": 24},
    {"time": "07:21:40", "event": 72},
    {"time": "07:22:40", "event": 24},
    {"time": "07:23:40", "event": 72},
    {"time": "07:24:40", "event": 99}
]
WINDOW_SIZE = 5*60

for window in sliding_window(DATA, WINDOW_SIZE):
    print(window, "sum=", sum(item["event"] for item in window))

which produces产生

window ends at item n°0 (datetime.datetime(1900, 1, 1, 7, 14, 40))
  reached the beginning of the series
[{'time': '07:14:40', 'event': 24}] sum= 24
window ends at item n°1 (datetime.datetime(1900, 1, 1, 7, 15, 40))
  candidate_time=datetime.datetime(1900, 1, 1, 7, 14, 40) candidate_delta=datetime.timedelta(seconds=60) candidate_delta.seconds=60
    added to the window
  reached the beginning of the series
[{'time': '07:14:40', 'event': 24}, {'time': '07:15:40', 'event': 394}] sum= 418
window ends at item n°2 (datetime.datetime(1900, 1, 1, 7, 16, 40))
  candidate_time=datetime.datetime(1900, 1, 1, 7, 15, 40) candidate_delta=datetime.timedelta(seconds=60) candidate_delta.seconds=60
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 14, 40) candidate_delta=datetime.timedelta(seconds=120) candidate_delta.seconds=120
    added to the window
  reached the beginning of the series
[{'time': '07:14:40', 'event': 24}, {'time': '07:15:40', 'event': 394}, {'time': '07:16:40', 'event': 384}] sum= 802
window ends at item n°3 (datetime.datetime(1900, 1, 1, 7, 17, 40))
  candidate_time=datetime.datetime(1900, 1, 1, 7, 16, 40) candidate_delta=datetime.timedelta(seconds=60) candidate_delta.seconds=60
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 15, 40) candidate_delta=datetime.timedelta(seconds=120) candidate_delta.seconds=120
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 14, 40) candidate_delta=datetime.timedelta(seconds=180) candidate_delta.seconds=180
    added to the window
  reached the beginning of the series
[{'time': '07:14:40', 'event': 24}, {'time': '07:15:40', 'event': 394}, {'time': '07:16:40', 'event': 384}, {'time': '07:17:40', 'event': 394}] sum= 1196
window ends at item n°4 (datetime.datetime(1900, 1, 1, 7, 18, 40))
  candidate_time=datetime.datetime(1900, 1, 1, 7, 17, 40) candidate_delta=datetime.timedelta(seconds=60) candidate_delta.seconds=60
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 16, 40) candidate_delta=datetime.timedelta(seconds=120) candidate_delta.seconds=120
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 15, 40) candidate_delta=datetime.timedelta(seconds=180) candidate_delta.seconds=180
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 14, 40) candidate_delta=datetime.timedelta(seconds=240) candidate_delta.seconds=240
    added to the window
  reached the beginning of the series
[{'time': '07:14:40', 'event': 24}, {'time': '07:15:40', 'event': 394}, {'time': '07:16:40', 'event': 384}, {'time': '07:17:40', 'event': 394}, {'time': '07:18:40', 'event': 384}] sum= 1580
window ends at item n°5 (datetime.datetime(1900, 1, 1, 7, 19, 40))
  candidate_time=datetime.datetime(1900, 1, 1, 7, 18, 40) candidate_delta=datetime.timedelta(seconds=60) candidate_delta.seconds=60
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 17, 40) candidate_delta=datetime.timedelta(seconds=120) candidate_delta.seconds=120
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 16, 40) candidate_delta=datetime.timedelta(seconds=180) candidate_delta.seconds=180
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 15, 40) candidate_delta=datetime.timedelta(seconds=240) candidate_delta.seconds=240
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 14, 40) candidate_delta=datetime.timedelta(seconds=300) candidate_delta.seconds=300
  stop there
[{'time': '07:15:40', 'event': 394}, {'time': '07:16:40', 'event': 384}, {'time': '07:17:40', 'event': 394}, {'time': '07:18:40', 'event': 384}, {'time': '07:19:40', 'event': 2}] sum= 1558
window ends at item n°6 (datetime.datetime(1900, 1, 1, 7, 20, 40))
  candidate_time=datetime.datetime(1900, 1, 1, 7, 19, 40) candidate_delta=datetime.timedelta(seconds=60) candidate_delta.seconds=60
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 18, 40) candidate_delta=datetime.timedelta(seconds=120) candidate_delta.seconds=120
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 17, 40) candidate_delta=datetime.timedelta(seconds=180) candidate_delta.seconds=180
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 16, 40) candidate_delta=datetime.timedelta(seconds=240) candidate_delta.seconds=240
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 15, 40) candidate_delta=datetime.timedelta(seconds=300) candidate_delta.seconds=300
  stop there
[{'time': '07:16:40', 'event': 384}, {'time': '07:17:40', 'event': 394}, {'time': '07:18:40', 'event': 384}, {'time': '07:19:40', 'event': 2}, {'time': '07:20:40', 'event': 24}] sum= 1188
window ends at item n°7 (datetime.datetime(1900, 1, 1, 7, 21, 40))
  candidate_time=datetime.datetime(1900, 1, 1, 7, 20, 40) candidate_delta=datetime.timedelta(seconds=60) candidate_delta.seconds=60
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 19, 40) candidate_delta=datetime.timedelta(seconds=120) candidate_delta.seconds=120
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 18, 40) candidate_delta=datetime.timedelta(seconds=180) candidate_delta.seconds=180
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 17, 40) candidate_delta=datetime.timedelta(seconds=240) candidate_delta.seconds=240
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 16, 40) candidate_delta=datetime.timedelta(seconds=300) candidate_delta.seconds=300
  stop there
[{'time': '07:17:40', 'event': 394}, {'time': '07:18:40', 'event': 384}, {'time': '07:19:40', 'event': 2}, {'time': '07:20:40', 'event': 24}, {'time': '07:21:40', 'event': 72}] sum= 876
window ends at item n°8 (datetime.datetime(1900, 1, 1, 7, 22, 40))
  candidate_time=datetime.datetime(1900, 1, 1, 7, 21, 40) candidate_delta=datetime.timedelta(seconds=60) candidate_delta.seconds=60
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 20, 40) candidate_delta=datetime.timedelta(seconds=120) candidate_delta.seconds=120
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 19, 40) candidate_delta=datetime.timedelta(seconds=180) candidate_delta.seconds=180
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 18, 40) candidate_delta=datetime.timedelta(seconds=240) candidate_delta.seconds=240
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 17, 40) candidate_delta=datetime.timedelta(seconds=300) candidate_delta.seconds=300
  stop there
[{'time': '07:18:40', 'event': 384}, {'time': '07:19:40', 'event': 2}, {'time': '07:20:40', 'event': 24}, {'time': '07:21:40', 'event': 72}, {'time': '07:22:40', 'event': 24}] sum= 506
window ends at item n°9 (datetime.datetime(1900, 1, 1, 7, 23, 40))
  candidate_time=datetime.datetime(1900, 1, 1, 7, 22, 40) candidate_delta=datetime.timedelta(seconds=60) candidate_delta.seconds=60
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 21, 40) candidate_delta=datetime.timedelta(seconds=120) candidate_delta.seconds=120
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 20, 40) candidate_delta=datetime.timedelta(seconds=180) candidate_delta.seconds=180
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 19, 40) candidate_delta=datetime.timedelta(seconds=240) candidate_delta.seconds=240
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 18, 40) candidate_delta=datetime.timedelta(seconds=300) candidate_delta.seconds=300
  stop there
[{'time': '07:19:40', 'event': 2}, {'time': '07:20:40', 'event': 24}, {'time': '07:21:40', 'event': 72}, {'time': '07:22:40', 'event': 24}, {'time': '07:23:40', 'event': 72}] sum= 194
window ends at item n°10 (datetime.datetime(1900, 1, 1, 7, 24, 40))
  candidate_time=datetime.datetime(1900, 1, 1, 7, 23, 40) candidate_delta=datetime.timedelta(seconds=60) candidate_delta.seconds=60
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 22, 40) candidate_delta=datetime.timedelta(seconds=120) candidate_delta.seconds=120
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 21, 40) candidate_delta=datetime.timedelta(seconds=180) candidate_delta.seconds=180
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 20, 40) candidate_delta=datetime.timedelta(seconds=240) candidate_delta.seconds=240
    added to the window
  candidate_time=datetime.datetime(1900, 1, 1, 7, 19, 40) candidate_delta=datetime.timedelta(seconds=300) candidate_delta.seconds=300
  stop there
[{'time': '07:20:40', 'event': 24}, {'time': '07:21:40', 'event': 72}, {'time': '07:22:40', 'event': 24}, {'time': '07:23:40', 'event': 72}, {'time': '07:24:40', 'event': 99}] sum= 291

To me it seems to answer your question: how to have a sliding window based on the time of the events.对我来说,它似乎回答了你的问题:如何根据事件的时间滑动 window。

I used a list for the data by simplicity.为了简单起见,我使用了一个数据list If you want to share a Minimal Reproducible Example it would be simpler to answer your question.如果您想分享一个最小可重现示例,那么回答您的问题会更简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM