[英]reverse of itertools.groupby?
I'm composing generators together for some data processing.我正在将生成器组合在一起进行一些数据处理。 I first batch the data generator for threading in an API call like:我首先批处理数据生成器以在 API 调用中进行线程处理,例如:
from itertools import groupby, count
def batch(data: List[Any], size=4):
c = count()
for _, g in groupby(data, lambda _: next(c)//size):
yield g
which i then feed to the threader to do an API call然后我将其提供给线程器以进行 API 调用
from concurrent.futures import ThreadPoolExecutor
def thread(data: Iterable, func: Callable, n=4):
with ThreadPoolExecutor(max_workers=n) as executor:
for batch in data:
yield executor.map(func, batch)
now I'm trying to merge the batches back into a list/generator for use down stream in the generator pipeline.现在我正在尝试将批次合并回列表/生成器,以便在生成器管道的下游使用。 I tried this我试过这个
from itertools import chain
def flat_map(batches: Iterable):
for i in list(chain(batches)):
yield i
But i
seems to still be a generator and not an item from the list?但i
似乎仍然是一个生成器而不是列表中的一个项目?
You wanted chain(*batches)
or chain.from_iterable(batches)
.你想要chain(*batches)
或chain.from_iterable(batches)
。 chain(batches)
is basically just yielding the same values as using batches
directly would get you, it just adds a layer of wrapping. chain(batches)
基本上只是产生与直接使用batches
相同的值,它只是增加了一层包装。 So the correct code (without list
ifying, which is almost certainly wrong here) is just:所以正确的代码(没有list
,这里几乎肯定是错误的)只是:
from itertools import chain
def flat_map(batches: Iterable):
return chain.from_iterable(batches) # chain(*batches) would also work, but if batches is an iterator itself, it would be forced to eagerly run to completion first; chain.from_iterable can begin work when the first batch is ready
you don't even need yield
since the iterator is already producing what you want.你甚至不需要yield
因为迭代器已经在生产你想要的东西。 If you need it to be a true generator, just replace return
with yield from
for a similar result.如果您需要它成为真正的生成器,只需将return
替换为yield from
获得类似的结果。
Also note: You might avoid the need for the function entirely by just changing:另请注意:您可以通过更改来完全避免使用该功能:
yield executor.map(func, batch)
to:到:
yield from executor.map(func, batch)
so thread
is flattening as it goes in the first place.所以thread
就变平了。
So I ended up with the three functions condensed into one:所以我最终将三个功能浓缩为一个:
from itertools import chain, groupby
from concurrent.futures import ThreadPoolExecutor
def spread(data: Iterable, func: Callable, n=4):
""" Combines `batch`, `thread` and `flat_map`"""
c = count()
with ThreadPoolExecutor(max_workers=n) as executor:
for _, batch in groupby(data, lambda _: next(c)//n):
yield from executor.map(func, batch)
So I just needed yield from
to get this to work.所以我只需要yield from
来让它工作。 Thank @ShadowRanger!感谢@ShadowRanger!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.