简体   繁体   English

Python aiohttp 模块:不明确的 .content 属性

[英]Python aiohttp module: ambiguous .content attribute

Here is a little code snippet:这是一个小代码片段:

import aiohttp
import aiofiles

async def fetch(url):
    # starting a session
    async with aiohttp.ClientSession() as session:
        # starting a get request
        async with session.get(url) as response:
            # getting response content
            content = await response.content
            return content
 
async def save_file(file_name, content):
    async with aiofiles.open(f'./binary/{file_name}', 'wb') as f:
      while True:
            chunk = content.read(1024)
            if not chunk:
                break
            f.write(chunk)

I am trying to download some binary files using the aiohttp library and then passing them to a coroutine using aiofiles library to write the file in the disk.我试图下载使用一些二进制文件aiohttp库,然后使用它们传递给协程aiofiles图书馆写在磁盘上的文件。 I have read the documentation but still couldn't figure out if I can pass content = await response.content or is it closed when the handle async with.. is closed?我已经阅读了文档,但仍然无法弄清楚我是否可以传递content = await response.content或者当句柄async with..关闭时它是否关闭? Because on a secondary blog , I found:因为在二级博客上,我发现:

According to aiohttp's documentation, because the response object was created in a context manager, it technically calls release() implicitly.根据 aiohttp 的文档,因为响应对象是在上下文管理器中创建的,所以它在技术上隐式调用了 release()。

Which confuses me, should I embed the logic of the second function inside the response handle or is my logic correct?这让我感到困惑,我应该在response句柄中嵌入第二个函数的逻辑还是我的逻辑正确?

The async context manager will close the resources related to the request, so if you return from the function, you have to make sure you've read everything of interest.异步上下文管理器将关闭与请求相关的资源,因此如果您从该函数返回,则必须确保您已阅读所有感兴趣的内容。 So you have two options:所以你有两个选择:

  1. read the entire response into memory, eg with content = await response.read() or, if the file doesn't fit into memory (and also if you want to speed things up by reading and writing in parallel)将整个响应读入内存,例如使用content = await response.read()或者,如果文件不适合内存(以及如果您想通过并行读写来加快速度)
  2. use a queue or an async iterator to parallelize reading and writing.使用队列或异步迭代器来并行化读写。

Here is an untested implementation of #2:这是 #2 的未经测试的实现:

async def fetch(url):
    # return an async generator over contents of URL
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            # getting response content in chunks no larger than 4K
            for chunk in response.content.iter_chunked(4096):
                yield chunk
 
async def save_file(file_name, content_iter):
    async with aiofiles.open(f'./binary/{file_name}', 'wb') as f:
        for chunk in content_iter:
            f.write(chunk)  # maybe you need to await this?

async def main():
    save_file(file_name, fetch(url))

Thanks to user4815162342 's code I could find a solution by parellelizing the fetch and write coroutines.感谢user4815162342的代码,我可以通过并行化获取和写入协程来找到解决方案。 I would've checked his code as the accepted solution but since I had to add some code to make it work, here it is:我会检查他的代码作为公认的解决方案,但由于我必须添加一些代码才能使其工作,这里是:

# fetch binary from server
async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            async for chunk in response.content.iter_chunked(4096):
                yield chunk

# write binary function
async def save_file(file_name, chunk_iter):
    list(map(create_dir_tree, list_binary_sub_dirs))
    async with aiofiles.open(f'./binary/bin_ts/{file_name}', 'wb') as f:
        async for chunk in chunk_iter:
            await f.write(chunk)
    

async def main(urls):
    tasks = []
    for url in urls:
        print('running on sublist')
        file_name = url.rpartition('/')[-1]
        request_ts = fetch(url)
        tasks.append(save_file(file_name, request_ts))
    await asyncio.gather(*tasks)

asyncio.run(main(some_list_of_urls))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM