简体   繁体   English

使用python生成器和openstack swift客户端的问题

[英]issues working with python generators and openstack swift client

I'm having a problem with Python generators while working with the Openstack Swift client library. 在使用Openstack Swift客户端库时,我遇到了Python生成器的问题。

The problem at hand is that I am trying to retrieve a large string of data from a specific url (about 7MB), chunk the string into smaller bits, and send a generator class back, with each iteration holding a chunked bit of the string. 手头的问题是我试图从特定的URL(大约7MB)中检索大量数据,将字符串分成较小的位,然后发回一个生成器类,每次迭代都保存一个字符串的块。 in the test suite, this is just a string that's sent to a monkeypatched class of the swift client for processing. 在测试套件中,这只是一个字符串,它被发送到swift客户端的monkeypatched类进行处理。

The code in the monkeypatched class looks like this: monkeypatched类中的代码如下所示:

def monkeypatch_class(name, bases, namespace):
    '''Guido's monkeypatch metaclass.'''
    assert len(bases) == 1, "Exactly one base class required"
    base = bases[0]
    for name, value in namespace.iteritems():
        if name != "__metaclass__":
            setattr(base, name, value)
    return base

And in the test suite: 在测试套件中:

from swiftclient import client
import StringIO
import utils

class Connection(client.Connection):
    __metaclass__ = monkeypatch_class

    def get_object(self, path, obj, resp_chunk_size=None, ...):
        contents = None
        headers = {}

        # retrieve content from path and store it in 'contents'
        ...

        if resp_chunk_size is not None:
            # stream the string into chunks
            def _object_body():
                stream = StringIO.StringIO(contents)
                buf = stream.read(resp_chunk_size)
                while buf:
                    yield buf
                    buf = stream.read(resp_chunk_size)
            contents = _object_body()
        return headers, contents

After returning the generator object, it was called by a stream function in the storage class: 返回生成器对象后,它由存储类中的流函数调用:

class SwiftStorage(Storage):

    def get_content(self, path, chunk_size=None):
        path = self._init_path(path)
        try:
            _, obj = self._connection.get_object(
                self._container,
                path,
                resp_chunk_size=chunk_size)
            return obj
        except Exception:
            raise IOError("Could not get content: {}".format(path))

    def stream_read(self, path):
        try:
            return self.get_content(path, chunk_size=self.buffer_size)
        except Exception:
            raise OSError(
                "Could not read content from stream: {}".format(path))

And finally, in my test suite: 最后,在我的测试套件中:

def test_stream(self):
    filename = self.gen_random_string()
    # test 7MB
    content = self.gen_random_string(7 * 1024 * 1024)
    self._storage.stream_write(filename, io)
    io.close()
    # test read / write
    data = ''
    for buf in self._storage.stream_read(filename):
        data += buf
    self.assertEqual(content,
                     data,
                     "stream read failed. output: {}".format(data))

The output ends up with this: 输出结束于此:

======================================================================
FAIL: test_stream (test_swift_storage.TestSwiftStorage)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/bacongobbler/git/github.com/bacongobbler/docker-registry/test/test_local_storage.py", line 46, in test_stream
    "stream read failed. output: {}".format(data))
AssertionError: stream read failed. output: <generator object _object_body at 0x2a6bd20>

I tried isolating this with a simple python script that follows the same flow as the code above, which passed without issues: 我尝试使用一个简单的python脚本来隔离它,该脚本遵循与上面代码相​​同的流程,并且没有遇到任何问题:

def gen_num():
    def _object_body():
        for i in range(10000000):
            yield i
    return _object_body()

def get_num():
    return gen_num()

def stream_read():
    return get_num()

def main():
    num = 0
    for i in stream_read():
        num += i
    print num

if __name__ == '__main__':
    main()

Any help with this issue is greatly appreciated :) 非常感谢任何有关此问题的帮助:)

In your get_object method, you're assigning the return value of _object_body() to the contents variable. get_object方法中,您将_object_body()的返回值_object_body() contents变量。 However, that variable is also the one that holds your actual data, and it's used early on in _object_body . 但是,该变量也是保存实际数据的变量,并且它在_object_body的早期使用。

The problem is that _object_body is a generator function (it uses yield ). 问题是_object_body是一个生成器函数(它使用yield )。 Therefore, when you call it, it produces a generator object, but the code of the function doesn't start running until you iterate over that generator . 因此,当您调用它时,它会生成一个生成器对象,但在迭代该生成器之前,该函数的代码才会开始运行 Which means that when the function's code actually starts running (the for loop in _test_stream ), it's long after you've reassigned contents = _object_body() . 这意味着当函数的代码实际开始运行时( _test_streamfor循环),在你重新分配contents = _object_body()之后很久。

Your stream = StringIO(contents) therefore creates a StringIO object containing the generator object (hence your error message), not the data. 因此,您的stream = StringIO(contents)会创建一个包含生成器对象的StringIO对象(因此您的错误消息), 而不是数据。

Here's a minimal reproduction case that illustrates the problem: 这是一个说明问题的最小复制案例:

def foo():
    contents = "Hello!"

    def bar():
        print contents
        yield 1

    # Only create the generator. This line runs none of the code in bar.
    contents = bar()

    print "About to start running..."
    for i in contents:
        # Now we run the code in bar, but contents is now bound to 
        # the generator object. So this doesn't print "Hello!"
        pass

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM