简体   繁体   English

在 Python 中按块(n 个)迭代迭代器?

[英]Iterate an iterator by chunks (of n) in Python?

Can you think of a nice way (maybe with itertools) to split an iterator into chunks of given size?你能想出一个很好的方法(也许使用 itertools)将迭代器拆分成给定大小的块吗?

Therefore l=[1,2,3,4,5,6,7] with chunks(l,3) becomes an iterator [1,2,3], [4,5,6], [7]因此l=[1,2,3,4,5,6,7] with chunks(l,3)成为迭代器[1,2,3], [4,5,6], [7]

I can think of a small program to do that but not a nice way with maybe itertools.我可以想到一个小程序来做到这一点,但可能不是 itertools 的好方法。

The grouper() recipe from the itertools documentation's recipes comes close to what you want:grouper()从配方itertools文档的食谱来靠近你想要什么:

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

It will fill up the last chunk with a fill value, though.不过,它将用填充值填充最后一个块。

A less general solution that only works on sequences but does handle the last chunk as desired is一个不太通用的解决方案,仅适用于序列但确实根据需要处理最后一个块是

[my_list[i:i + chunk_size] for i in range(0, len(my_list), chunk_size)]

Finally, a solution that works on general iterators an behaves as desired is最后,一个适用于通用迭代器的解决方案是

def grouper(n, iterable):
    it = iter(iterable)
    while True:
        chunk = tuple(itertools.islice(it, n))
        if not chunk:
            return
        yield chunk

Although OP asks function to return chunks as list or tuple, in case you need to return iterators, then Sven Marnach's solution can be modified:尽管 OP 要求函数将块作为列表或元组返回,但如果您需要返回迭代器,则可以修改Sven Marnach 的解决方案:

def grouper_it(n, iterable):
    it = iter(iterable)
    while True:
        chunk_it = itertools.islice(it, n)
        try:
            first_el = next(chunk_it)
        except StopIteration:
            return
        yield itertools.chain((first_el,), chunk_it)

Some benchmarks: http://pastebin.com/YkKFvm8b一些基准测试: http : //pastebin.com/YkKFvm8b

It will be slightly more efficient only if your function iterates through elements in every chunk.只有当您的函数遍历每个块中的元素时,它才会稍微高效一些。

This will work on any iterable.这将适用于任何可迭代对象。 It returns generator of generators (for full flexibility).它返回生成器的生成器(为了充分的灵活性)。 I now realize that it's basically the same as @reclosedevs solution, but without the fluff.我现在意识到它与@reclosedevs 解决方案基本相同,但没有绒毛。 No need for try...except as the StopIteration propagates up, which is what we want.无需try...except StopIteration向上传播,这正是我们想要的。

The next(iterable) call is needed to raise the StopIteration when the iterable is empty, since islice will continue spawning empty generators forever if you let it.当 iterable 为空时,需要next(iterable)调用来引发StopIteration ,因为如果你允许, islice将永远继续生成空生成器。

It's better because it's only two lines long, yet easy to comprehend.它更好,因为它只有两行长,但易于理解。

def grouper(iterable, n):
    while True:
        yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))

Note that next(iterable) is put into a tuple.请注意, next(iterable)被放入一个元组中。 Otherwise, if next(iterable) itself were iterable, then itertools.chain would flatten it out.否则,如果next(iterable)本身是可迭代的,那么itertools.chain会将其展平。 Thanks to Jeremy Brown for pointing out this issue.感谢 Jeremy Brown 指出这个问题。

I was working on something today and came up with what I think is a simple solution.我今天正在做一些事情,并想出了一个我认为很简单的解决方案。 It is similar to jsbueno's answer, but I believe his would yield empty group s when the length of iterable is divisible by n .它类似于jsbueno 的答案,但我相信当iterable的长度可以被n整除时,他会产生空group s 。 My answer does a simple check when the iterable is exhausted.iterable对象耗尽时,我的回答会做一个简单的检查。

def chunk(iterable, chunk_size):
    """Generate sequences of `chunk_size` elements from `iterable`."""
    iterable = iter(iterable)
    while True:
        chunk = []
        try:
            for _ in range(chunk_size):
                chunk.append(iterable.next())
            yield chunk
        except StopIteration:
            if chunk:
                yield chunk
            break

Since python 3.8, there is a simpler solution using the := operator:从 python 3.8 开始,使用:=运算符有一个更简单的解决方案:

def grouper(it: Iterator, n: int) -> Iterator[list]:
    start, stop = 0, n
    while chunk := list(itertools.islice(it, start, stop)):
        start, stop = stop, stop+n
        yield chunk

usage:用法:

>>> list(grouper(iter('ABCDEFG'), 3))
[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]

Here's one that returns lazy chunks;这是一个返回惰性块的方法; use map(list, chunks(...)) if you want lists.如果需要map(list, chunks(...))使用map(list, chunks(...))

from itertools import islice, chain
from collections import deque

def chunks(items, n):
    items = iter(items)
    for first in items:
        chunk = chain((first,), islice(items, n-1))
        yield chunk
        deque(chunk, 0)

if __name__ == "__main__":
    for chunk in map(list, chunks(range(10), 3)):
        print chunk

    for i, chunk in enumerate(chunks(range(10), 3)):
        if i % 2 == 1:
            print "chunk #%d: %s" % (i, list(chunk))
        else:
            print "skipping #%d" % i

A succinct implementation is:一个简洁的实现是:

chunker = lambda iterable, n: (ifilterfalse(lambda x: x == (), chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=())))

This works because [iter(iterable)]*n is a list containing the same iterator n times;这是有效的,因为[iter(iterable)]*n是一个包含相同迭代器 n 次的列表; zipping over that takes one item from each iterator in the list, which is the same iterator , with the result that each zip-element contains a group of n items.压缩从列表中的每个迭代器中取出一个项目,这是相同的迭代器,结果每个 zip 元素包含一组n项目。

izip_longest is needed to fully consume the underlying iterable, rather than iteration stopping when the first exhausted iterator is reached, which chops off any remainder from iterable .需要izip_longest来完全消耗底层 iterable,而不是在到达第一个耗尽的迭代器时停止迭代,这会切断iterable任何剩余部分。 This results in the need to filter out the fill-value.这导致需要过滤掉填充值。 A slightly more robust implementation would therefore be:因此,稍微更健壮的实现将是:

def chunker(iterable, n):
    class Filler(object): pass
    return (ifilterfalse(lambda x: x is Filler, chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=Filler)))

This guarantees that the fill value is never an item in the underlying iterable.这保证了填充值永远不是底层迭代中的项目。 Using the definition above:使用上面的定义:

iterable = range(1,11)

map(tuple,chunker(iterable, 3))
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10,)]

map(tuple,chunker(iterable, 2))
[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]

map(tuple,chunker(iterable, 4))
[(1, 2, 3, 4), (5, 6, 7, 8), (9, 10)]

This implementation almost does what you want, but it has issues:这个实现几乎可以满足您的要求,但它有问题:

def chunks(it, step):
  start = 0
  while True:
    end = start+step
    yield islice(it, start, end)
    start = end

(The difference is that because islice does not raise StopIteration or anything else on calls that go beyond the end of it this will yield forever; there is also the slightly tricky issue that the islice results must be consumed before this generator is iterated). (不同的是,因为islice不抛出StopIteration异常或超越的结束通话别的it ,这将永远屈服;也有稍微棘手的问题是, islice该发电机迭代结果之前必须消耗)。

To generate the moving window functionally:要在功能上生成移动窗口:

izip(count(0, step), count(step, step))

So this becomes:所以这变成了:

(it[start:end] for (start,end) in izip(count(0, step), count(step, step)))

But, that still creates an infinite iterator.但是,这仍然会创建一个无限迭代器。 So, you need takewhile (or perhaps something else might be better) to limit it:因此,您需要花点时间(或者其他可能更好的方法)来限制它:

chunk = lambda it, step: takewhile((lambda x: len(x) > 0), (it[start:end] for (start,end) in izip(count(0, step), count(step, step))))

g = chunk(range(1,11), 3)

tuple(g)
([1, 2, 3], [4, 5, 6], [7, 8, 9], [10])

"Simpler is better than complex" - a straightforward generator a few lines long can do the job. “简单胜于复杂” - 一个简单的生成器,几行就可以完成这项工作。 Just place it in some utilities module or so:只需将它放在一些实用程序模块左右:

def grouper (iterable, n):
    iterable = iter(iterable)
    count = 0
    group = []
    while True:
        try:
            group.append(next(iterable))
            count += 1
            if count % n == 0:
                yield group
                group = []
        except StopIteration:
            yield group
            break

I forget where I found the inspiration for this.我忘记了我在哪里找到了这个灵感。 I've modified it a little to work with MSI GUID's in the Windows Registry:我对它进行了一些修改以使用 Windows 注册表中的 MSI GUID:

def nslice(s, n, truncate=False, reverse=False):
    """Splits s into n-sized chunks, optionally reversing the chunks."""
    assert n > 0
    while len(s) >= n:
        if reverse: yield s[:n][::-1]
        else: yield s[:n]
        s = s[n:]
    if len(s) and not truncate:
        yield s

reverse doesn't apply to your question, but it's something I use extensively with this function. reverse不适用于您的问题,但这是我在此功能中广泛使用的内容。

>>> [i for i in nslice([1,2,3,4,5,6,7], 3)]
[[1, 2, 3], [4, 5, 6], [7]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True)]
[[1, 2, 3], [4, 5, 6]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True, reverse=True)]
[[3, 2, 1], [6, 5, 4]]

Here you go.干得好。

def chunksiter(l, chunks):
    i,j,n = 0,0,0
    rl = []
    while n < len(l)/chunks:        
        rl.append(l[i:j+chunks])        
        i+=chunks
        j+=j+chunks        
        n+=1
    return iter(rl)


def chunksiter2(l, chunks):
    i,j,n = 0,0,0
    while n < len(l)/chunks:        
        yield l[i:j+chunks]
        i+=chunks
        j+=j+chunks        
        n+=1

Examples:例子:

for l in chunksiter([1,2,3,4,5,6,7,8],3):
    print(l)

[1, 2, 3]
[4, 5, 6]
[7, 8]

for l in chunksiter2([1,2,3,4,5,6,7,8],3):
    print(l)

[1, 2, 3]
[4, 5, 6]
[7, 8]


for l in chunksiter2([1,2,3,4,5,6,7,8],5):
    print(l)

[1, 2, 3, 4, 5]
[6, 7, 8]

Code golf edition:代码高尔夫版:

def grouper(iterable, n):
    for i in range(-(-len(iterable)//n)):
        yield iterable[i*n:i*n+n]

Usage:用法:

>>> list(grouper('ABCDEFG', 3))
['ABC', 'DEF', 'G']

This function takes iterables which do not need to be Sized , so it will accept iterators too.这个 function 接受不需要Sized的迭代器,所以它也接受迭代器。 It supports infinite iterables and will error-out if chunks with a smaller size than 1 are selected (even though giving size == 1 is effectively useless).它支持无限迭代,如果选择的块大小小于 1,则会出错(即使给 size == 1 实际上是无用的)。

The type annotations are of course optional and the / in the parameters (which makes iterable positional-only) can be removed if you wish.类型注释当然是可选的,如果您愿意,可以删除参数中的/ (这使得iterable的位置)。

T = TypeVar("T")


def chunk(iterable: Iterable[T], /, size: int) -> Generator[list[T], None, None]:
    """Yield chunks of a given size from an iterable."""
    if size < 1:
        raise ValueError("Cannot make chunks smaller than 1 item.")

    def chunker():
        current_chunk = []
        for item in iterable:
            current_chunk.append(item)

            if len(current_chunk) == size:
                yield current_chunk

                current_chunk = []

        if current_chunk:
            yield current_chunk

    # Chunker generator is returned instead of yielding directly so that the size check
    #  can raise immediately instead of waiting for the first next() call.
    return chunker()

A couple improvements on reclosedev's answer that make it:recloseddev 的答案进行了一些改进:

  1. Operate more efficiently and with less boilerplate code in the loop by delegating the pulling of the first element to Python itself, rather than manually doing so with a next call in a try / except StopIteration: block通过将第一个元素的拉取委托给 Python 本身,而不是通过在try / except StopIteration:块中的next调用手动执行此操作,从而在循环中更有效地运行并使用更少的样板代码

  2. Handle the case where the user discards the rest of the elements in any given chunk (eg an inner loop over the chunk break s under certain conditions);处理用户丢弃任何给定块中元素的 rest 的情况(例如,在某些条件下对块break的内部循环); in reclosedev's solution , aside from the very first element (which is definitely consumed), any other "skipped" elements aren't actually skipped (they just become the initial elements of the next chunk, which means you're no longer pulling data from n -aligned offsets, and if the caller break sa loop over a chunk, they must manually consume the remaining elements even if they don't need them)recloseddev 的解决方案中,除了第一个元素(肯定被消耗)之外,任何其他“跳过”的元素实际上都没有被跳过(它们只是成为下一个块的初始元素,这意味着您不再从中提取数据n对齐的偏移量,如果调用者在一个块上break sa 循环,他们必须手动使用剩余的元素,即使他们不需要它们)

Combining those two fixes gets:结合这两个修复得到:

import collections  # At top of file
from itertools import chain, islice  # At top of file, denamespaced for slight speed boost

# Pre-create a utility "function" that silently consumes and discards all remaining elements in
# an iterator. This is the fastest way to do so on CPython (deque has a specialized mode
# for maxlen=0 that pulls and discards faster than Python level code can, and by precreating
# the deque and prebinding the extend method, you don't even need to create new deques each time)
_consume = collections.deque(maxlen=0).extend

def batched_it(iterable, n):
    "Batch data into sub-iterators of length n. The last batch may be shorter."
    # batched_it('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    n -= 1  # First element pulled for us, pre-decrement n so we don't redo it every loop
    it = iter(iterable)
    for first_el in it:
        chunk_it = islice(it, n)
        try:
            yield chain((first_el,), chunk_it)
        finally:
            _consume(chunk_it)  # Efficiently consume any elements caller didn't consume

Try it online! 在线试用!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM