itertools.groupby 的反向？

Question

我正在将生成器组合在一起进行一些数据处理。 我首先批处理数据生成器以在 API 调用中进行线程处理，例如：

from itertools import groupby, count
def batch(data: List[Any], size=4):
    c = count()
    for _, g in groupby(data, lambda _: next(c)//size):
        yield g

然后我将其提供给线程器以进行 API 调用

from concurrent.futures import ThreadPoolExecutor
def thread(data: Iterable, func: Callable, n=4):
    with ThreadPoolExecutor(max_workers=n) as executor:
        for batch in data:
            yield executor.map(func, batch)

现在我正在尝试将批次合并回列表/生成器，以便在生成器管道的下游使用。 我试过这个

from itertools import chain
def flat_map(batches: Iterable):
    for i in list(chain(batches)):
        yield i

但i似乎仍然是一个生成器而不是列表中的一个项目？

Answer 1

你想要chain(*batches)或chain.from_iterable(batches) 。 chain(batches)基本上只是产生与直接使用batches相同的值，它只是增加了一层包装。 所以正确的代码（没有list ，这里几乎肯定是错误的）只是：

from itertools import chain
def flat_map(batches: Iterable):
    return chain.from_iterable(batches)  # chain(*batches) would also work, but if batches is an iterator itself, it would be forced to eagerly run to completion first; chain.from_iterable can begin work when the first batch is ready

你甚至不需要yield因为迭代器已经在生产你想要的东西。 如果您需要它成为真正的生成器，只需将return替换为yield from获得类似的结果。

另请注意：您可以通过更改来完全避免使用该功能：

yield executor.map(func, batch)

到：

yield from executor.map(func, batch)

所以thread就变平了。

Answer 2

所以我最终将三个功能浓缩为一个：

from itertools import chain, groupby
from concurrent.futures import ThreadPoolExecutor

def spread(data: Iterable, func: Callable, n=4):
    """ Combines `batch`, `thread` and `flat_map`"""
    c = count()
    with ThreadPoolExecutor(max_workers=n) as executor:
        for _, batch in groupby(data, lambda _: next(c)//n):
            yield from executor.map(func, batch)

所以我只需要yield from来让它工作。 感谢@ShadowRanger！

itertools.groupby 的反向？

问题描述

2 个解决方案

解决方案1
3 已采纳 2021-10-15 07:30:25

解决方案2
0 2021-10-17 20:22:49

itertools.groupby 的反向？

问题描述

2 个解决方案

解决方案1 3 已采纳 2021-10-15 07:30:25

解决方案2 0 2021-10-17 20:22:49

解决方案1
3 已采纳 2021-10-15 07:30:25

解决方案2
0 2021-10-17 20:22:49