[英]Python aiohttp module: ambiguous .content attribute
Here is a little code snippet:这是一个小代码片段:
import aiohttp
import aiofiles
async def fetch(url):
# starting a session
async with aiohttp.ClientSession() as session:
# starting a get request
async with session.get(url) as response:
# getting response content
content = await response.content
return content
async def save_file(file_name, content):
async with aiofiles.open(f'./binary/{file_name}', 'wb') as f:
while True:
chunk = content.read(1024)
if not chunk:
break
f.write(chunk)
I am trying to download some binary files using the aiohttp
library and then passing them to a coroutine using aiofiles
library to write the file in the disk.我试图下载使用一些二进制文件
aiohttp
库,然后使用它们传递给协程aiofiles
图书馆写在磁盘上的文件。 I have read the documentation but still couldn't figure out if I can pass content = await response.content
or is it closed when the handle async with..
is closed?我已经阅读了文档,但仍然无法弄清楚我是否可以传递
content = await response.content
或者当句柄async with..
关闭时它是否关闭? Because on a secondary blog , I found:因为在二级博客上,我发现:
According to aiohttp's documentation, because the response object was created in a context manager, it technically calls release() implicitly.
根据 aiohttp 的文档,因为响应对象是在上下文管理器中创建的,所以它在技术上隐式调用了 release()。
Which confuses me, should I embed the logic of the second function inside the response
handle or is my logic correct?这让我感到困惑,我应该在
response
句柄中嵌入第二个函数的逻辑还是我的逻辑正确?
The async context manager will close the resources related to the request, so if you return from the function, you have to make sure you've read everything of interest.异步上下文管理器将关闭与请求相关的资源,因此如果您从该函数返回,则必须确保您已阅读所有感兴趣的内容。 So you have two options:
所以你有两个选择:
content = await response.read()
or, if the file doesn't fit into memory (and also if you want to speed things up by reading and writing in parallel)content = await response.read()
或者,如果文件不适合内存(以及如果您想通过并行读写来加快速度) Here is an untested implementation of #2:这是 #2 的未经测试的实现:
async def fetch(url):
# return an async generator over contents of URL
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
# getting response content in chunks no larger than 4K
for chunk in response.content.iter_chunked(4096):
yield chunk
async def save_file(file_name, content_iter):
async with aiofiles.open(f'./binary/{file_name}', 'wb') as f:
for chunk in content_iter:
f.write(chunk) # maybe you need to await this?
async def main():
save_file(file_name, fetch(url))
Thanks to user4815162342 's code I could find a solution by parellelizing the fetch and write coroutines.感谢user4815162342的代码,我可以通过并行化获取和写入协程来找到解决方案。 I would've checked his code as the accepted solution but since I had to add some code to make it work, here it is:
我会检查他的代码作为公认的解决方案,但由于我必须添加一些代码才能使其工作,这里是:
# fetch binary from server
async def fetch(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
async for chunk in response.content.iter_chunked(4096):
yield chunk
# write binary function
async def save_file(file_name, chunk_iter):
list(map(create_dir_tree, list_binary_sub_dirs))
async with aiofiles.open(f'./binary/bin_ts/{file_name}', 'wb') as f:
async for chunk in chunk_iter:
await f.write(chunk)
async def main(urls):
tasks = []
for url in urls:
print('running on sublist')
file_name = url.rpartition('/')[-1]
request_ts = fetch(url)
tasks.append(save_file(file_name, request_ts))
await asyncio.gather(*tasks)
asyncio.run(main(some_list_of_urls))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.