简体   繁体   English

如何更改滑动 window 发生器一次跳跃两个元素?

[英]How to change a sliding window generator to jump two elements at a time?

I've reproduced the sliding window code shown here , but I need to modify it to jump two elements at a time instead of just one.我已经复制了此处显示的滑动 window 代码,但我需要对其进行修改以一次跳转两个元素,而不仅仅是一个元素。

Original code:原始代码:

def window(seq, n=3):
    it= iter(seq)
    result = list(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + [elem,]
        yield result

If I start with the following list:如果我从以下列表开始:

My_List= ['adl_01_11', 'adl_01_12', 'adl_01_13', 'adl_01_14', 'adl_02_15', 'adl_02_16', 'adl_02_17', 'adl_02_18', 'adl_02_19', 'adl_02_20', 'adl_02_21', 'adl_02_22']

and I apply the window over My_List , I get the following result:我在 My_List 上应用My_List ,得到以下结果:

[['adl_01_11', 'adl_01_12', 'adl_01_13'], ['adl_01_12', 'adl_01_13', 'adl_01_14'], ['adl_01_13', 'adl_01_14', 'adl_02_15'], ['adl_01_14', 'adl_02_15', 'adl_02_16'], ['adl_02_15', 'adl_02_16', 'adl_02_17'], ['adl_02_16', 'adl_02_17', 'adl_02_18'], ['adl_02_17', 'adl_02_18', 'adl_02_19'], ['adl_02_18', 'adl_02_19', 'adl_02_20'], ['adl_02_19', 'adl_02_20', 'adl_02_21'], ['adl_02_20', 'adl_02_21', 'adl_02_22']]

How do I change this function if I want to iterate through 2 items at a time?如果我想一次遍历 2 个项目,如何更改此 function? That means I expect a result like this:这意味着我希望得到这样的结果:

[['adl_01_11', 'adl_01_12', 'adl_01_13'], ['adl_01_13', 'adl_01_14', 'adl_01_15'], ['adl_01_15', 'adl_01_16', 'adl_02_17'], ['adl_01_17', 'adl_02_18', 'adl_02_19'], ['adl_02_19', 'adl_02_20', 'adl_02_21']]

Notice that adl_02_22 is no longer in the results, and my window iterates every 2 items.注意adl_02_22不再出现在结果中,我的 window 每 2 个项目迭代一次。

In the window function I tried changing result[1:] to result[2:] but it doesn't work well.在 window function 我尝试将result[1:]更改为result[2:]但效果不佳。 Any idea?任何想法?

I propose three solutions for this problem:针对这个问题,我提出了三种解决方案:

  1. a specific one for a window of size 3 with a step of 2,一个特定的尺寸为 3 的 window,步长为 2,
  2. a general one with any window and step size,具有任何 window 和步长的通用型,
  3. skip all this and use existing libraries.跳过所有这些并使用现有的库。

Solution 1: sliding window with hard-coded size=3 and step=2解决方案1:滑动window,硬编码大小=3,步长=2

If you replace the for elem in it: loop by the equivalent while True: loop which tries next(it) until StopIteration is raised, that will let you use next(it) twice per iteration instead of just once:如果将其中的for elem in it:循环替换为等效的while True:循环,该循环将尝试next(it)直到引发StopIteration ,这将使您每次迭代使用next(it)两次,而不是仅使用一次:

def window_size3_step2(seq):
    it = iter(seq)

    try:
        result = [0,0,next(it)]
    except StopIteration:
        return

    while True:
        try:
            result = [result[2], next(it), next(it)]
        except StopIteration:
            break
        else:
            yield result


My_List= ['adl_01_11', 'adl_01_12', 'adl_01_13', 'adl_01_14', 'adl_02_15', 'adl_02_16', 'adl_02_17', 'adl_02_18', 'adl_02_19', 'adl_02_20', 'adl_02_21', 'adl_02_22']

print(f"{list(window_size3_step2(My_List))}")

Output: Output:

[['adl_01_11', 'adl_01_12', 'adl_01_13'], ['adl_01_13', 'adl_01_14', 'adl_02_15'], ['adl_02_15', 'adl_02_16', 'adl_02_17'], ['adl_02_17', 'adl_02_18', 'adl_02_19'], ['adl_02_19', 'adl_02_20', 'adl_02_21']]

Testing for shorter lists:测试较短的列表:

for n in range(7):
    print(f"len={n} result={list(window_size3_step2(range(n)))}")

len=0 result=[]
len=1 result=[]
len=2 result=[]
len=3 result=[[0, 1, 2]]
len=4 result=[[0, 1, 2]]
len=5 result=[[0, 1, 2], [2, 3, 4]]
len=6 result=[[0, 1, 2], [2, 3, 4]]

Solution 2: general window function with arbitrary size and step解决方案2:通用window function,任意尺寸和步长

This second solution goes back to using islice to take into account the given window size argument, which I've renamed size for clarify, and accepts a step argument that can also take any positive integer value.第二个解决方案回到使用islice来考虑给定的 window 大小参数,我已将其重命名为size以进行澄清,并接受一个step参数,该参数也可以采用任何正 integer 值。

from itertools import islice
def window(seq, size=3, step=1):
    if size < 1 or step < 1:
        raise ValueError("Nobody likes infinite loops.")
    it = iter(seq)
    result = list(islice(it, size))
    while len(result) == size:
        yield result
        if step >= size:
            result = list(islice(it, step-size, step))
        else:
            result = result[step:] + list(islice(it, step))

On your input list, window(My_List, size=3, step=2) , or just window(My_List, step=2) , returns the list of lists you want.在您的输入列表中, window(My_List, size=3, step=2)或只是window(My_List, step=2) ,返回您想要的列表列表。

I've also tested this with a wide variety of seq length, size and step, and I can confirm it works correctly in all cases.我还用各种 seq 长度、大小和步长对此进行了测试,我可以确认它在所有情况下都能正常工作。 Eg, the output of this loop (try it yourself, I don't want to paste this long output here) is correct on every line:例如,这个循环的 output (自己试试,我不想在这里粘贴这么长的 output)在每一行都是正确的:

for input_size in range(10):
    for window_size in range(1,4):
        for step_size in range(1,4):
            print(f"len={input_size} size={window_size} step={step_size} "
                  f"result={list(window(range(input_size), size=window_size, step=step_size))}")

Solution 3: there's a library for this!解决方案3:有一个图书馆!

The more_itertools library already provides a function doing just this: more_itertools库已经提供了一个 function 这样做:

I had to install it first:我必须先安装它:

pip3 install more_itertools

Use it:用它:

from more_itertools import windowed
print(f"{list(windowed(My_List, 3, step=2))}")

[('adl_01_11', 'adl_01_12', 'adl_01_13'), ('adl_01_13', 'adl_01_14', 'adl_02_15'), ('adl_02_15', 'adl_02_16', 'adl_02_17'), ('adl_02_17', 'adl_02_18', 'adl_02_19'), ('adl_02_19', 'adl_02_20', 'adl_02_21'), ('adl_02_21', 'adl_02_22', None)]

It's not exactly what you asked for, though, because it pads the last incomplete window with None (or any fill value you provide) instead of truncating the end.不过,这并不完全符合您的要求,因为它用None (或您提供的任何填充值)填充最后一个不完整的 window 而不是截断结尾。

While using existing libraries is often a good choice, I learned more creating solutions 1 and 2, and I hope you find value in the progression.虽然使用现有的库通常是一个不错的选择,但我在创建解决方案 1 和 2 时学到了更多,我希望你能在进步中找到价值。

Credits:学分:

I found the more_itertools solution here: https://stackoverflow.com/a/46412374/3216427我在这里找到了 more_itertools 解决方案: https://stackoverflow.com/a/46412374/3216427

I think you may have transcribed your expected output incorrectly.我认为您可能错误地转录了您预期的 output。

The items in your expected output, ( adl_01_15 , adl_01_16 , adl_01_17 ), do not exist in My_List .您预期的 output 中的项目( adl_01_15adl_01_16adl_01_17 )在My_List中不存在。

If so, this will do:如果是这样,这将是:

islice(window(My_List), 0, None, 2)

and if you don't need a generator:如果您不需要生成器:

list(window(My_List))[::2]
# input
lst = ['adl_01_11', 'adl_01_12', 'adl_01_13', 'adl_01_14', 'adl_02_15', 'adl_02_16', 'adl_02_17', 'adl_02_18', 'adl_02_19', 'adl_02_20', 'adl_02_21', 'adl_02_22']
# remove last entry if required
lst = lst[:-(len(lst[1:]) % 2)]
# get midpoints of sublist and add previous and following value to it
lst = [[lst[i-1], x, lst[i+1]] for i, x in enumerate(lst) if ((i+1) % 2) == 0]

print(lst)
# [['adl_01_11', 'adl_01_12', 'adl_01_13'],
#  ['adl_01_13', 'adl_01_14', 'adl_02_15'],
#  ['adl_02_15', 'adl_02_16', 'adl_02_17'],
#  ['adl_02_17', 'adl_02_18', 'adl_02_19'],
#  ['adl_02_19', 'adl_02_20', 'adl_02_21']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM