[英]Making sense of asyncio.as_completed()
想象一個像這樣的簡單程序,使用它我從 api 網關可用的多頁 json 數據中獲取數據字段。 (抱歉,我找不到支持分頁以使示例完全可重現的免費 json api。)
import asyncio
import aiohttp
async def fetch(url, params = None):
async with aiohttp.ClientSession() as session:
async with session.get(url, params) as response:
return await response.json()
async def get_all_pages(base_url):
def paginate(size=10**6):
limit = 100
offset = 0
while offset <= size:
yield {"offset": offset, "limit": limit}
offset += limit
total = (await fetch(base_url))["data"]["total"] # total number of pages
coroutines = [fetch(base_url, params) for params in paginate(total)]
print("total number of pages: {}, total number of coroutines: {}".format(total, len(coroutines))
for routine in asyncio.as_completed(coroutines):
r = await routine
yield r["data"]["field"] #a field in the data for each page
async def main():
url = "http://arandomurl.com"
results = []
async for x in get_all_pages(url):
results.append(x)
print(len(results)) #returns 1 -> only the first element is returned
asyncio.run(main())
問題是我的main
function 中的 for 循環僅檢索我的生成器的第一個元素,不知何故生成器在發布第一個元素后停止。 這意味着as_completed
沒有像我認為的那樣在def_get_all_pages
中工作:發布完成的協程的結果,然后將其傳遞給yield r["data"]["field"]
。 線。 我怎樣才能正確地做到這一點?
這是我寫的一個測試程序。 我拿了問題中發布的代碼並替換了 function “fetch” 的內容以返回字典。 通過這個更改,我實際上可以運行該程序,並且它可以工作。 每 100 個“頁面”我會在“結果”中獲得一項。
import asyncio
async def fetch(_url, params = None):
if params is None:
return {"data": {"total": 169}}
return {"data": {"field" : str(params)}}
async def get_all_pages(base_url):
def paginate(size=10**6):
limit = 100
offset = 0
while offset <= size:
yield {"offset": offset, "limit": limit}
offset += limit
total = (await fetch(base_url))["data"]["total"] # total number of pages
coroutines = [fetch(base_url, params) for params in paginate(total)]
print("total number of pages: {}, total number of coroutines: {}".format(
total, len(coroutines)))
for routine in asyncio.as_completed(coroutines):
r = await routine
yield r["data"]["field"] #a field in the data for each page
async def main():
url = "http://arandomurl.com"
results = []
async for x in get_all_pages(url):
results.append(x)
print(results) #returns 1 -> only the first element is returned
asyncio.run(main())
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.