如何有效地迭代迭代器上的“n-wise”

Question

Possibly a duplicate, but I couldn't find anything. 可能是重复，但我找不到任何东西。

I have a very long iterator (10000 items) and I need to iterate over it ~500 items at a time. 我有一个很长的迭代器（10000项），我需要迭代它〜一次500项。 So if my iterator was range(10000) , it would look like this: 因此，如果我的迭代器是range(10000) ，它将如下所示：

Iteration #1: 0, 1, 2, ... 497, 498, 499
Iteration #2: 1, 2, 3, ... 498, 499, 500
Iteration #3: 2, 3, 4, ... 499, 500, 501
Iteration #4: 3, 4, 5, ... 500, 501, 502
...
Iteration #9500: 9499, 9500, 9501 ... 9996, 9997, 9998
Iteration #9501: 9500, 9501, 9502 ... 9997, 9998, 9999

and so on. 等等。 There is this method: 有这种方法：

def nwise_slice(lst, n):
    for i in range(len(lst) - n + 1):
        yield lst[i:i + n]

However, this doesn't work with lazy iterators. 但是，这不适用于惰性迭代器。 I tried to create a solution using iterators and adapted from the itertools pairwise and consume recipes (see here ) to create this: 我试图创建一个使用迭代器的解决方案，并从适应itertools pairwise和consume食谱（见这里）来创建这样的：

import itertools

def nwise_iter(lst, n):
    iters = itertools.tee(lst, n)
    for idx, itr in enumerate(iters):
        next(itertools.islice(itr, idx, idx), None)

    for group in zip(*iters):
        yield group

which does the same (albeit yielding a tuple rather than a list , which does not matter to me). 它做同样的事情（虽然产生一个tuple而不是一个list ，这对我来说无关紧要）。 I also believe it doesn't create a lot of unnecessary slices. 我也相信它不会产生很多不必要的切片。 This solution works on non-sliceable iterators, like files (which I plan to work with). 此解决方案适用于不可切片的迭代器，如文件（我计划使用）。 However, the itertools solution was 2x slower: 但是， itertools解决方案速度慢了2 itertools ：

In [4]: %timeit list(nwise_slice(list(range(10000)), 500))
46.9 ms ± 729 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit list(nwise_iter(list(range(10000)), 500))
102 ms ± 3.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

I don't want to have to load all of my test data into memory to take advantage of the slice method. 我不想将所有测试数据加载到内存中以利用slice方法。 Is there a more efficient way to pull this off? 是否有更有效的方法来解决这个问题？

Answer 1

What about using a deque to " memoize " your items? 如何使用deque来“ 记住 ”你的物品？

from collections import deque

def nwise_slice(it, n):
    deq = deque((), n)
    for x in it:
        deq.append(x)
        if len(deq)==n: yield deq

my_range = range(8)
for sub in nwise_slice(my_range, 5):
    print(sub)
# =>
# deque([0, 1, 2, 3, 4], maxlen=5)
# deque([1, 2, 3, 4, 5], maxlen=5)
# deque([2, 3, 4, 5, 6], maxlen=5)
# deque([3, 4, 5, 6, 7], maxlen=5)

如何有效地迭代迭代器上的“n-wise”

问题描述

1 个解决方案

解决方案1
4 已采纳 2019-01-20 20:50:58

如何有效地迭代迭代器上的“n-wise”

问题描述

1 个解决方案

解决方案1 4 已采纳 2019-01-20 20:50:58

解决方案1
4 已采纳 2019-01-20 20:50:58