简体   繁体   English

重置发电机 object Python

[英]Resetting generator object in Python

I have a generator object returned by multiple yield.我有一个生成器 object 由 multiple yield 返回。 Preparation to call this generator is rather time-consuming operation.调用此生成器的准备工作是相当耗时的操作。 That is why I want to reuse the generator several times.这就是为什么我想多次重复使用生成器。

y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)

Of course, I'm taking in mind copying content into simple list.当然,我正在考虑将内容复制到简单列表中。 Is there a way to reset my generator?有没有办法重置我的发电机?

Generators can't be rewound.发电机不能倒带。 You have the following options:您有以下选择:

  1. Run the generator function again, restarting the generation:再次运行生成器函数,重新开始生成:

     y = FunctionWithYield() for x in y: print(x) y = FunctionWithYield() for x in y: print(x)
  2. Store the generator results in a data structure on memory or disk which you can iterate over again:将生成器结果存储在内存或磁盘上的数据结构中,您可以再次迭代:

     y = list(FunctionWithYield()) for x in y: print(x) # can iterate again: for x in y: print(x)

The downside of option 1 is that it computes the values again.选项1的缺点是它再次计算值。 If that's CPU-intensive you end up calculating twice.如果那是 CPU 密集型的,你最终会计算两次。 On the other hand, the downside of 2 is the storage.另一方面, 2的缺点是存储。 The entire list of values will be stored on memory.整个值列表将存储在内存中。 If there are too many values, that can be unpractical.如果值太多,那可能是不切实际的。

So you have the classic memory vs. processing tradeoff .所以你有经典的内存与处理权衡 I can't imagine a way of rewinding the generator without either storing the values or calculating them again.我无法想象在不存储值或再次计算它们的情况下重绕生成器的方法。

Another option is to use the itertools.tee() function to create a second version of your generator:另一种选择是使用itertools.tee()函数来创建生成器的第二个版本:

import itertools
y = FunctionWithYield()
y, y_backup = itertools.tee(y)
for x in y:
    print(x)
for x in y_backup:
    print(x)

This could be beneficial from memory usage point of view if the original iteration might not process all the items.如果原始迭代可能无法处理所有项目,则从内存使用的角度来看,这可能是有益的。

>>> def gen():
...     def init():
...         return 0
...     i = init()
...     while True:
...         val = (yield i)
...         if val=='restart':
...             i = init()
...         else:
...             i += 1

>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2

Probably the most simple solution is to wrap the expensive part in an object and pass that to the generator:可能最简单的解决方案是将昂贵的部分包装在一个对象中并将其传递给生成器:

data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass

This way, you can cache the expensive calculations.这样,您可以缓存昂贵的计算。

If you can keep all results in RAM at the same time, then use list() to materialize the results of the generator in a plain list and work with that.如果您可以同时将所有结果保存在 RAM 中,那么使用list()将生成器的结果具体化到一个普通列表中并使用它。

I want to offer a different solution to an old problem我想为一个老问题提供不同的解决方案

class IterableAdapter:
    def __init__(self, iterator_factory):
        self.iterator_factory = iterator_factory

    def __iter__(self):
        return self.iterator_factory()

squares = IterableAdapter(lambda: (x * x for x in range(5)))

for x in squares: print(x)
for x in squares: print(x)

The benefit of this when compared to something like list(iterator) is that this is O(1) space complexity and list(iterator) is O(n) .list(iterator)类的东西相比,这样做的好处是这是O(1)空间复杂度,而list(iterator)O(n) The disadvantage is that, if you only have access to the iterator, but not the function that produced the iterator, then you cannot use this method.缺点是,如果您只能访问迭代器,而不能访问生成迭代器的函数,则不能使用此方法。 For example, it might seem reasonable to do the following, but it will not work.例如,执行以下操作似乎是合理的,但它不起作用。

g = (x * x for x in range(5))

squares = IterableAdapter(lambda: g)

for x in squares: print(x)
for x in squares: print(x)

If GrzegorzOledzki's answer won't suffice, you could probably use send() to accomplish your goal.如果 GrzegorzOledzki 的回答不够,您可能可以使用send()来实现您的目标。 See PEP-0342 for more details on enhanced generators and yield expressions.有关增强的生成器和产量表达式的更多详细信息,请参阅PEP-0342

UPDATE: Also see itertools.tee() .更新:另见itertools.tee() It involves some of that memory vs. processing tradeoff mentioned above, but it might save some memory over just storing the generator results in a list ;它涉及上面提到的一些内存与处理权衡,但它可能会比仅将生成器结果存储在list节省一些内存; it depends on how you're using the generator.这取决于您如何使用生成器。

If your generator is pure in a sense that its output only depends on passed arguments and the step number, and you want the resulting generator to be restartable, here's a sort snippet that might be handy:如果您的生成器在某种意义上是纯粹的,它的输出仅取决于传递的参数和步骤编号,并且您希望生成的生成器可重新启动,那么这里有一个可能很方便的排序片段:

import copy

def generator(i):
    yield from range(i)

g = generator(10)
print(list(g))
print(list(g))

class GeneratorRestartHandler(object):
    def __init__(self, gen_func, argv, kwargv):
        self.gen_func = gen_func
        self.argv = copy.copy(argv)
        self.kwargv = copy.copy(kwargv)
        self.local_copy = iter(self)

    def __iter__(self):
        return self.gen_func(*self.argv, **self.kwargv)

    def __next__(self):
        return next(self.local_copy)

def restartable(g_func: callable) -> callable:
    def tmp(*argv, **kwargv):
        return GeneratorRestartHandler(g_func, argv, kwargv)

    return tmp

@restartable
def generator2(i):
    yield from range(i)

g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))

outputs:输出:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1

Using a wrapper function to handle StopIteration使用包装函数来处理StopIteration

You could write a simple wrapper function to your generator-generating function that tracks when the generator is exhausted.您可以为生成器生成函数编写一个简单的包装函数,用于跟踪生成器何时耗尽。 It will do so using the StopIteration exception a generator throws when it reaches end of iteration.它将使用生成器在迭代结束时抛出的StopIteration异常来实现。

import types

def generator_wrapper(function=None, **kwargs):
    assert function is not None, "Please supply a function"
    def inner_func(function=function, **kwargs):
        generator = function(**kwargs)
        assert isinstance(generator, types.GeneratorType), "Invalid function"
        try:
            yield next(generator)
        except StopIteration:
            generator = function(**kwargs)
            yield next(generator)
    return inner_func

As you can spot above, when our wrapper function catches a StopIteration exception, it simply re-initializes the generator object (using another instance of the function call).正如您在上面看到的,当我们的包装函数捕获到StopIteration异常时,它只是重新初始化生成器对象(使用函数调用的另一个实例)。

And then, assuming you define your generator-supplying function somewhere as below, you could use the Python function decorator syntax to wrap it implicitly:然后,假设您在如下某处定义了生成器提供函数,您可以使用 Python 函数装饰器语法隐式包装它:

@generator_wrapper
def generator_generating_function(**kwargs):
    for item in ["a value", "another value"]
        yield item

From official documentation of tee :来自tee 的官方文档

In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().通常,如果一个迭代器在另一个迭代器启动之前使用了大部分或全部数据,则使用 list() 而不是 tee() 会更快。

So it's best to use list(iterable) instead in your case.所以最好在你的情况下使用list(iterable)

You can define a function that returns your generator您可以定义一个返回生成器的函数

def f():
  def FunctionWithYield(generator_args):
    code here...

  return FunctionWithYield

Now you can just do as many times as you like:现在,您可以随心所欲地进行多次:

for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)

I'm not sure what you meant by expensive preparation, but I guess you actually have我不确定你说的昂贵的准备是什么意思,但我想你实际上有

data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)

If that's the case, why not reuse data ?如果是这样,为什么不重用data

There is no option to reset iterators.没有重置迭代器的选项。 Iterator usually pops out when it iterate through next() function. Iterator 通常在遍历next()函数时弹出。 Only way is to take a backup before iterate on the iterator object.唯一的方法是在迭代迭代器对象之前进行备份。 Check below.检查下面。

Creating iterator object with items 0 to 9使用项目 0 到 9 创建迭代器对象

i=iter(range(10))

Iterating through next() function which will pop out遍历将弹出的 next() 函数

print(next(i))

Converting the iterator object to list将迭代器对象转换为列表

L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

so item 0 is already popped out.所以项目 0 已经弹出。 Also all the items are popped as we converted the iterator to list.当我们将迭代器转换为列表时,所有项目也会弹出。

next(L) 

Traceback (most recent call last):
  File "<pyshell#129>", line 1, in <module>
    next(L)
StopIteration

So you need to convert the iterator to lists for backup before start iterating.因此,您需要在开始迭代之前将迭代器转换为列表进行备份。 List could be converted to iterator with iter(<list-object>)可以使用iter(<list-object>)转换为迭代器

You can now use more_itertools.seekable (a third-party tool) which enables resetting iterators.您现在可以使用more_itertools.seekable (第三方工具)来重置迭代器。

Install via > pip install more_itertools通过> pip install more_itertools

import more_itertools as mit


y = mit.seekable(FunctionWithYield())
for x in y:
    print(x)

y.seek(0)                                              # reset iterator
for x in y:
    print(x)

Note: memory consumption grows while advancing the iterator, so be wary of large iterables.注意:随着迭代器的推进,内存消耗会增加,所以要警惕大的迭代器。

You can do that by using itertools.cycle() you can create an iterator with this method and then execute a for loop over the iterator which will loop over its values.您可以通过使用itertools.cycle()来做到这一点,您可以使用此方法创建一个迭代器,然后在迭代器上执行 for 循环,该循环将循环其值。

For example:例如:

def generator():
for j in cycle([i for i in range(5)]):
    yield j

gen = generator()
for i in range(20):
    print(next(gen))

will generate 20 numbers, 0 to 4 repeatedly.将生成 20 个数字,0 到 4 重复。

A note from the docs:文档中的注释:

Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).

How it's work for me.它是如何为我工作的。

csv_rows = my_generator()
for _ in range(10):
    for row in csv_rows:
        print(row)
    csv_rows = my_generator()

Ok, you say you want to call a generator multiple times, but initialization is expensive... What about something like this?好吧,你说你想多次调用一个生成器,但是初始化很昂贵......这样的事情怎么样?

class InitializedFunctionWithYield(object):
    def __init__(self):
        # do expensive initialization
        self.start = 5

    def __call__(self, *args, **kwargs):
        # do cheap iteration
        for i in xrange(5):
            yield self.start + i

y = InitializedFunctionWithYield()

for x in y():
    print x

for x in y():
    print x

Alternatively, you could just make your own class that follows the iterator protocol and defines some sort of 'reset' function.或者,您可以创建自己的类,该类遵循迭代器协议并定义某种“重置”函数。

class MyIterator(object):
    def __init__(self):
        self.reset()

    def reset(self):
        self.i = 5

    def __iter__(self):
        return self

    def next(self):
        i = self.i
        if i > 0:
            self.i -= 1
            return i
        else:
            raise StopIteration()

my_iterator = MyIterator()

for x in my_iterator:
    print x

print 'resetting...'
my_iterator.reset()

for x in my_iterator:
    print x

https://docs.python.org/2/library/stdtypes.html#iterator-types http://anandology.com/python-practice-book/iterators.html https://docs.python.org/2/library/stdtypes.html#iterator-types http://anandology.com/python-practice-book/iterators.html

My answer solves slightly different problem: If the generator is expensive to initialize and each generated object is expensive to generate.我的答案解决了稍微不同的问题:如果生成器的初始化成本很高,并且每个生成的对象的生成成本都很高。 But we need to consume the generator multiple times in multiple functions.但是我们需要在多个函数中多次消耗生成器。 In order to call the generator and each generated object exactly once we can use threads and Run each of the consuming methods in different thread.为了只调用生成器和每个生成的对象一次,我们可以使用线程并在不同的线程中运行每个消费方法。 We may not achieve true parallelism due to GIL, but we will achieve our goal.由于 GIL,我们可能无法实现真正​​的并行性,但我们会实现我们的目标。

This approach did a good job in the following case: deep learning model processes a lot of images.这种方法在以下情况下做得很好:深度学习模型处理大量图像。 The result is a lot of masks for a lot of objects on the image.结果是图像上许多对象的许多蒙版。 Each mask consumes memory.每个掩码都消耗内存。 We have around 10 methods which make different statistics and metrics, but they take all the images at once.我们有大约 10 种方法可以生成不同的统计数据和指标,但它们一次获取所有图像。 All the images cannot fit in memory.内存中无法容纳所有图像。 The moethods can easily be rewritten to accept iterator.方法可以很容易地重写为接受迭代器。

class GeneratorSplitter:
'''
Split a generator object into multiple generators which will be sincronised. Each call to each of the sub generators will cause only one call in the input generator. This way multiple methods on threads can iterate the input generator , and the generator will cycled only once.
'''

def __init__(self, gen):
    self.gen = gen
    self.consumers: List[GeneratorSplitter.InnerGen] = []
    self.thread: threading.Thread = None
    self.value = None
    self.finished = False
    self.exception = None

def GetConsumer(self):
    # Returns a generator object. 
    cons = self.InnerGen(self)
    self.consumers.append(cons)
    return cons

def _Work(self):
    try:
        for d in self.gen:
            for cons in self.consumers:
                cons.consumed.wait()
                cons.consumed.clear()

            self.value = d

            for cons in self.consumers:
                cons.readyToRead.set()

        for cons in self.consumers:
            cons.consumed.wait()

        self.finished = True

        for cons in self.consumers:
            cons.readyToRead.set()
    except Exception as ex:
        self.exception = ex
        for cons in self.consumers:
            cons.readyToRead.set()

def Start(self):
    self.thread = threading.Thread(target=self._Work)
    self.thread.start()

class InnerGen:
    def __init__(self, parent: "GeneratorSplitter"):
        self.parent: "GeneratorSplitter" = parent
        self.readyToRead: threading.Event = threading.Event()
        self.consumed: threading.Event = threading.Event()
        self.consumed.set()

    def __iter__(self):
        return self

    def __next__(self):
        self.readyToRead.wait()
        self.readyToRead.clear()
        if self.parent.finished:
            raise StopIteration()
        if self.parent.exception:
            raise self.parent.exception
        val = self.parent.value
        self.consumed.set()
        return val

Ussage:用法:

genSplitter = GeneratorSplitter(expensiveGenerator)

metrics={}
executor = ThreadPoolExecutor(max_workers=3)
f1 = executor.submit(mean,genSplitter.GetConsumer())
f2 = executor.submit(max,genSplitter.GetConsumer())
f3 = executor.submit(someFancyMetric,genSplitter.GetConsumer())
genSplitter.Start()

metrics.update(f1.result())
metrics.update(f2.result())
metrics.update(f3.result())

If you want to reuse this generator multiple times, you can use functools.partial .如果你想多次重用这个生成器,你可以使用functools.partial

from functools import partial
func_with_yield = partial(FunctionWithYield)

for i in range(100):
    for x in func_with_yield():
        print(x)

This will wrap the generator function in another function so each time you call func_with_yield() it creates the same generator function.这会将生成器 function 包装在另一个 function 中,因此每次调用func_with_yield()时,它都会创建相同的生成器 function。

Note: It also accepts function arguments for partial(FunctionWithYield, args) if you have arguments.注意:如果您有 arguments,它也接受 function arguments partial(FunctionWithYield, args)

It can be done by code object.它可以通过代码对象来完成。 Here is the example.这是示例。

code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i

1 2 3 4 1 2 3 4

for i in y: print i


exec(code1)
for i in y: print i

1 2 3 4 1 2 3 4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM