简体   繁体   English

如何从迭代器创建 Python 文件,如 Object

[英]How To Create a Python file-like Object from an Iterator

I am testing the throughput of writing to S3 from a python glue shell job by using the upload_fileobj function from the boto3 client.我正在使用来自boto3客户端的upload_fileobj function 作业测试从 python glue shell 作业写入S3的吞吐量。 The input to this function is这个 function 的输入是

Fileobj (a file-like object) -- A file-like object to upload. Fileobj(类文件对象)——要上传的类文件 object。 At a minimum, it must implement the read method, and must return bytes.至少,它必须实现 read 方法,并且必须返回字节。

In order to have the test isolate just the throughput, as opposed to memory or CPU capabilities, I think the best way to use upload_file_object would be to pass an iterator that produces N bytes of the value 0 .为了让测试仅隔离吞吐量,而不是 memory 或 CPU 功能,我认为使用 upload_file_object 的最佳方法是传递一个iterator ,该迭代器产生N个字节的值0

In python, how can a "file like object" be created from an iterator?在 python 中,如何从迭代器创建“类文件对象”?

I'm looking for something of the form我正在寻找某种形式的东西

from itertools import repeat

number_of_bytes = 1024 * 1024

zero_iterator = repeat(b'0', number_of_bytes)

file_like_object = something(zero_iterator) # fill in 'something'

Which would then be passed to boto3 for writing然后将其传递给 boto3 进行编写

session.client('s3').upload_fileobj(file_like_object, Bucket='my_bucket')

Thank you in advance for your consideration and response.预先感谢您的考虑和回复。

This is a simplified version of the answer at https://stackoverflow.com/a/70547492/1319998 , since we only need to deal with bytes , and so should be suitable for boto3's upload_fileobj这是https://stackoverflow.com/a/70547492/1319998答案的简化版本,因为我们只需要处理bytes ,因此应该适合 boto3 的upload_fileobj

def to_file_like_obj(iterable):
    chunk = b''
    offset = 0
    it = iter(iterable)

    def up_to_iter(size):
        nonlocal chunk, offset

        while size:
            if offset == len(chunk):
                try:
                    chunk = next(it)
                except StopIteration:
                    break
                else:
                    offset = 0
            to_yield = min(size, len(chunk) - offset)
            offset = offset + to_yield
            size -= to_yield
            yield chunk[offset - to_yield:offset]

    class FileLikeObj:
        def read(self, size=-1):
            return b''.join(up_to_iter(float('inf') if size is None or size < 0 else size))

    return FileLikeObj()

If you have an iterable that yields bytes, my_iterable say, this can be used with boto3 as follows:如果你有一个产生字节的迭代, my_iterable说,这可以与 boto3 一起使用,如下所示:

target_obj = boto3.Session().resource('s3').Bucket('my-target-bucket').Object('my/target/key')
target_obj.upload_fileobj(to_file_like_obj(my_iterable)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM