简体   繁体   English

如何克隆 Python 生成器对象?

[英]How to clone a Python generator object?

Consider this scenario:考虑这种情况:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

walk = os.walk('/home')

for root, dirs, files in walk:
    for pathname in dirs+files:
        print os.path.join(root, pathname)

for root, dirs, files in walk:
    for pathname in dirs+files:
        print os.path.join(root, pathname)

I know that this example is kinda redundant, but you should consider that we need to use the same walk data more than once.我知道这个例子有点多余,但你应该考虑到我们需要多次使用相同的walk数据。 I've a benchmark scenario and the use of same walk data is mandatory to get helpful results.我有一个基准场景,必须使用相同的walk数据才能获得有用的结果。

I've tried walk2 = walk to clone and use in the second iteration, but it didn't work.我试过walk2 = walk克隆并在第二次迭代中使用,但它没有用。 The question is... How can I copy it?问题是...我怎样才能复制它? Is it ever possible?有可能吗?

Thank you in advance.先感谢您。

You can use itertools.tee() :您可以使用itertools.tee()

walk, walk2 = itertools.tee(walk)

Note that this might "need significant extra storage", as the documentation points out.请注意,正如文档所指出的,这可能“需要大量的额外存储空间”。

If you know you are going to iterate through the whole generator for every usage, you will probably get the best performance by unrolling the generator to a list and using the list multiple times.如果您知道每次使用都要遍历整个生成器,您可能会通过将生成器展开到一个列表并多次使用该列表来获得最佳性能。

walk = list(os.walk('/home'))

Define a function定义一个函数

 def walk_home():
     for r in os.walk('/home'):
         yield r

Or even this甚至这个

def walk_home():
    return os.walk('/home')

Both are used like this:两者都是这样使用的:

for root, dirs, files in walk_home():
    for pathname in dirs+files:
        print os.path.join(root, pathname)

This is a good usecase for functools.partial() to make a quick generator-factory:这是functools.partial()用于创建快速生成器工厂的一个很好的用例:

from functools import partial
import os

walk_factory = partial(os.walk, '/home')

walk1, walk2, walk3 = walk_factory(), walk_factory(), walk_factory()

What functools.partial() does is hard to describe with human-words, but this^ is what it's for. functools.partial()的作用很难用人类语言来描述,但这就是它的用途。

It partially fills out function-params without executing that function.部分填充函数参数而不执行该函数。 Consequently it acts as a function/generator factory.因此它充当函数/生成器工厂。

This answer aims to extend/elaborate on what the other answers have expressed.该答案旨在扩展/详细说明其他答案所表达的内容。 The solution will necessarily vary depending on what exactly you aim to achieve.解决方案必然会根据您要实现的目标而有所不同。

If you want to iterate over the exact same result of os.walk multiple times, you will need to initialize a list from the os.walk iterable's items (ie walk = list(os.walk(path)) ).如果你想多次迭代os.walk的完全相同的结果,你将需要从os.walk可迭代的项目(即walk = list(os.walk(path)) )初始化一个列表。

If you must guarantee the data remains the same, that is probably your only option.如果您必须保证数据保持不变,那可能是您唯一的选择。 However, there are several scenarios in which this is not possible or desirable.但是,在某些情况下这是不可能或不可取的。

  1. It will not be possible to list() an iterable if the output is of sufficient size (ie attempting to list() an entire filesystem may freeze your computer).如果输出足够大(即尝试list()整个文件系统可能会冻结您的计算机),则不可能list()一个可迭代对象。
  2. It is not desirable to list() an iterable if you wish to acquire "fresh" data prior to each use.如果您希望在每次使用之前获取“新鲜”数据,则不希望list()一个可迭代对象。

In the event that list() is not suitable, you will need to run your generator on demand.如果list()不合适,您将需要按需运行生成器。 Note that generators are extinguised after each use, so this poses a slight problem.请注意,发电机在每次使用后都会熄灭,因此这会带来一个小问题。 In order to "rerun" your generator multiple times, you can use the following pattern:为了多次“重新运行”您的生成器,您可以使用以下模式:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

class WalkMaker:
    def __init__(self, path):
        self.path = path
    def __iter__(self):
        for root, dirs, files in os.walk(self.path):
            for pathname in dirs + files:
                yield os.path.join(root, pathname)

walk = WalkMaker('/home')

for path in walk:
    pass

# do something...

for path in walk:
    pass

The aforementioned design pattern will allow you to keep your code DRY.前面提到的设计模式将使您的代码保持干爽。

This "Python Generator Listeners" code allows you to have many listeners on a single generator, like os.walk , and even have someone "chime in" later.这个“Python Generator Listeners”代码允许您在单个生成器上拥有多个侦听器,例如os.walk ,甚至可以稍后让某人“插话”。

def walkme(): os.walk('/home') def walkme(): os.walk('/home')

m1 = Muxer(walkme) m2 = Muxer(walkme) m1 = Muxer(随身听) m2 = Muxer(随身听)

then m1 and m2 can run in threads even and process at their leisure.然后 m1 和 m2 甚至可以在线程中运行并在空闲时进行处理。

See: https://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3请参阅: https ://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3

import queue
from threading import Lock
from collections import namedtuple

class Muxer():
    Entry = namedtuple('Entry', 'genref listeners, lock')

    already = {}
    top_lock = Lock()

    def __init__(self, func, restart=False):
        self.restart = restart
        self.func = func
        self.queue = queue.Queue()

        with self.top_lock:
            if func not in self.already:
                self.already[func] = self.Entry([func()], [], Lock())
            ent = self.already[func]

        self.genref = ent.genref
        self.lock = ent.lock
        self.listeners = ent.listeners

        self.listeners.append(self)

    def __iter__(self):
        return self

    def __next__(self):
        try:
            e = self.queue.get_nowait()
        except queue.Empty:
            with self.lock:
                try:
                    e = self.queue.get_nowait()
                except queue.Empty:
                    try:
                        e = next(self.genref[0])
                        for other in self.listeners:
                            if not other is self:
                                other.queue.put(e)
                    except StopIteration:
                        if self.restart:
                            self.genref[0] = self.func()
                        raise
        return e

    def __del__(self):
        with self.top_lock:
            try:
                self.listeners.remove(self)
            except ValueError:
                pass
            if not self.listeners and self.func in self.already:
                del self.already[self.func]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM