简体   繁体   中英

How to convert async generator stream into a file-like object in Python3?

So I made a webservice (based on starlette), with an endpoint that accepts a binary body. I want to feed this binary body to fastavro.

Starlette doc says , I can access the raw data as a async stream with request.stream() .

async for chunk in request.stream():
    # do something with chunk...

Now, I want to feed the stream to fastavro. The thing is, fastavro reader needs a file-like input stream:

with open('some-file.avro', 'rb') as fo:
    avro_reader = reader(fo)

My question is, is there a clean way to transform this async stream into a file-like one?

I guess I could implement an object that has a read() method that awaits and returns the data returns by request.stream. But if the caller passes a size, I need to have a memory buffer, doesn't I? Could something based on BufferedRWPair?

Or is the only way to store the whole stream first to the disk or memory, before feeding it to fastavro?

Thanks in advance !

I ended up using a SpooledTemporaryFile:

data_file = SpooledTemporaryFile(mode='w+b',
        max_size=MAX_RECEIVED_DATA_MEMORY_SIZE)
async for chunk in request.stream():
    data_file.write(chunk)
data_file.seek(0)
avro_reader = reader(data_file)

It's not the ideal solution I envisonned (somehow transmit the data directly between the input and output), but still good enough...

I encountered the same problem and wrote compact class StreamingBody . It does exactly what I need.

from typing import AsyncIterator
import asyncio


class AsyncGen:
    def __init__(self, block_count, block_size) -> None:
        self.bc = block_count
        self.bs = block_size

    def __aiter__(self):
        return self

    async def __anext__(self):

        if self.bc == 0:
            raise StopAsyncIteration()

        self.bc -= 1
        return b"A" * self.bs


class StreamingBody:

    _chunks: AsyncIterator[bytes]
    _backlog: bytes

    def __init__(self, chunks: AsyncIterator[bytes]):
        self._chunks = chunks
        self._backlog = b""

    async def _read_until_end(self):

        content = self._backlog
        self._backlog = b""

        while True:
            try:
                content += await self._chunks.__anext__()
            except StopAsyncIteration:
                break

        return content

    async def _read_chunk(self, size: int):

        content = self._backlog
        bytes_read = len(self._backlog)

        while bytes_read < size:

            try:
                chunk = await self._chunks.__anext__()
            except StopAsyncIteration:
                break

            content += chunk
            bytes_read += len(chunk)

        self._backlog = content[size:]
        content = content[:size]

        return content

    async def read(self, size: int = -1):
        if size > 0:
            return await self._read_chunk(size)
        elif size == -1:
            return await self._read_until_end()
        else:
            return b""

async def main():
    async_gen = AsyncGen(11, 3)
    body = StreamingBody(async_gen)

    res = await body.read(11)
    print(f"[{len(res)}]: {res}")

    res = await body.read()
    print(f"[{len(res)}]: {res}")

    res = await body.read()
    print(f"[{len(res)}]: {res}")


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM