如何防止引發 asyncio.TimeoutError 並繼續循環

Question

我正在使用帶有 limited_as_completed 方法的 aiohttp 來加速報廢（大約 1 億個靜態網站頁面）。 但是，代碼在幾分鍾后停止，並返回 TimeoutError。 我嘗試了幾件事，但仍然無法阻止引發 asyncio.TimeoutError。 請問我如何忽略錯誤並繼續？

我正在運行的代碼是：

N=123
import html
from lxml import etree
import requests
import asyncio 
import aiohttp
from aiohttp import ClientSession, TCPConnector
import pandas as pd
import re 
import csv 
import time
from itertools import islice
import sys
from contextlib import suppress

start = time.time()
data = {}
data['name'] = []
filename = "C:\\Users\\xxxx"+ str(N) + ".csv"

def limited_as_completed(coros, limit):
    futures = [
        asyncio.ensure_future(c)
        for c in islice(coros, 0, limit)
    ]
    async def first_to_finish():
        while True:
            await asyncio.sleep(0)
            for f in futures:
                if f.done():
                    futures.remove(f)
                    try:
                        newf = next(coros)
                        futures.append(
                            asyncio.ensure_future(newf))
                    except StopIteration as e:
                        pass
                    return f.result()
    while len(futures) > 0:
        yield first_to_finish()

async def get_info_byid(i, url, session):
    async with session.get(url,timeout=20) as resp:
        print(url)
        with suppress(asyncio.TimeoutError):
            r = await resp.text()
            name = etree.HTML(r).xpath('//h2[starts-with(text(),"Customer Name")]/text()')
            data['name'].append(name)
            dataframe = pd.DataFrame(data)
            dataframe.to_csv(filename, index=False, sep='|')

limit = 1000
async def print_when_done(tasks):
    for res in limited_as_completed(tasks, limit):
        await res

url = "http://xxx.{}.html"
loop = asyncio.get_event_loop()

async def main():
    connector = TCPConnector(limit=10)
    async with ClientSession(connector=connector,headers=headers,raise_for_status=False) as session:
        coros = (get_info_byid(i, url.format(i), session) for i in range(N,N+1000000))
        await print_when_done(coros)

loop.run_until_complete(main())
loop.close()
print("took", time.time() - start, "seconds.")

錯誤日志是：

Traceback (most recent call last):
  File "C:\Users\xxx.py", line 111, in <module>
    loop.run_until_complete(main())
  File "C:\Users\xx\AppData\Local\Programs\Python\Python37-32\lib\asyncio\base_events.py", line 573, in run_until_complete
    return future.result()
  File "C:\Users\xxx.py", line 109, in main
    await print_when_done(coros)
  File "C:\Users\xxx.py", line 98, in print_when_done
    await res
  File "C:\Users\xxx.py", line 60, in first_to_finish
    return f.result()
  File "C:\Users\xxx.py", line 65, in get_info_byid
    async with session.get(url,timeout=20) as resp:
  File "C:\Users\xx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\aiohttp\client.py", line 855, in __aenter__
    self._resp = await self._coro
  File "C:\Users\xx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\aiohttp\client.py", line 391, in _request
    await resp.start(conn)
  File "C:\Users\xx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\aiohttp\client_reqrep.py", line 770, in start
    self._continue = None
  File "C:\Users\xx\AppData\Local\Programs\Python\Python37-32\lib\site-packages\aiohttp\helpers.py", line 673, in __exit__
    raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError

我試過 1) 添加 expect asyncio.TimeoutError: pass。 不工作

async def get_info_byid(i, url, session):
    async with session.get(url,timeout=20) as resp:
        print(url)
        try:
            r = await resp.text()
            name = etree.HTML(r).xpath('//h2[starts-with(text(),"Customer Name")]/text()')
            data['name'].append(name)
            dataframe = pd.DataFrame(data)
            dataframe.to_csv(filename, index=False, sep='|')
        except asyncio.TimeoutError:
            pass

2）抑制（asyncio.TimeoutError）如上所示。 不工作

我昨天剛學了 aiohttp，所以也許我的代碼中還有其他錯誤導致僅在運行幾分鍾后就會出現超時錯誤？ 如果有人知道如何處理，非常感謝！

Answer 1

簡單示例（不是很好，但工作正常）：

import asyncio
from aiohttp.client import ClientSession


class Wrapper:

    def __init__(self, session):
        self._session = session

    async def get(self, url):
        try:
            async with self._session.get(url, timeout=20) as resp:
                return await resp.text()
        except Exception as e:
            print(e)


loop = asyncio.get_event_loop()
wrapper = Wrapper(ClientSession())

responses = loop.run_until_complete(
    asyncio.gather(
        wrapper.get('http://google.com'),
        wrapper.get('http://google.com'),
        wrapper.get('http://google.com'),
        wrapper.get('http://google.com'),
        wrapper.get('http://google.com')
    )
)

print(responses)

Answer 2

@Yurii Kramarenko 所做的肯定會引發未關閉的客戶端會話異常，因為該會話從未被正確關閉。 我推薦的是這樣的：

import asyncio
import aiohttp

async def main(urls):
    async with aiohttp.ClientSession(timeout=self.timeout) as session:
        tasks=[self.do_something(session,url) for url in urls]
        await asyncio.gather(*tasks)

Answer 3

我喜歡@jbxiaoyu 的回答，但是超時 kwarg 似乎采用了一個特殊的對象，所以我想我會補充說您需要創建一個 ClientTimeout 對象，然后將其傳遞給 Session，如下所示：

from aiohttp import ClientSession, ClientTimeout
timeout = ClientTimeout(total=600)
async with ClientSession(timeout=timeout) as session:
    tasks=[self.do_something(session,url) for url in urls]
    await asyncio.gather(*tasks)

Answer 4

當我收到此錯誤時，我意識到我沒有連接到 vpn。 所以我建議你檢查你在哪里提出請求。

如何防止引發 asyncio.TimeoutError 並繼續循環

問題描述

3 個解決方案

解決方案1
4 已采納 2018-11-08 15:48:04

解決方案2
3 2020-02-29 02:52:42

解決方案3
2 2020-06-23 21:25:41

解決方案4
-1 2021-05-24 18:55:16

如何防止引發 asyncio.TimeoutError 並繼續循環

問題描述

3 個解決方案

解決方案1 4 已采納 2018-11-08 15:48:04

解決方案2 3 2020-02-29 02:52:42

解決方案3 2 2020-06-23 21:25:41

解決方案4 -1 2021-05-24 18:55:16

解決方案1
4 已采納 2018-11-08 15:48:04

解決方案2
3 2020-02-29 02:52:42

解決方案3
2 2020-06-23 21:25:41

解決方案4
-1 2021-05-24 18:55:16