[英]How to bypass or catch the error of socket.timeout when checking if a website is working or not?
我一直在開發一個程序來檢查網站是否正常工作。 我從Excel工作表中獲取URL,然后在同一個Excel工作表中將結果粘貼為True和false但是對於某些URL,我收到socket.timeout錯誤,之后代碼無效。 這是代碼:
import http.client as httpc
from urllib.parse import urlparse
import pandas as pd
import xlwings as xw
import smtplib
from xlsxwriter import Workbook
import socket
x=[]
df = pd.read_excel (r'xyz.xlsx')
df1=pd.DataFrame(df,columns=['URL'])
print(df1)
url_list=df["URL"].tolist()
print(url_list)
for i in url_list:
def checkUrl(i):
if 'http' not in i:
i= 'https://'+i
p = urlparse(i)
conn = httpc.HTTPConnection(p.netloc,timeout=4)
conn.request('HEAD', p.path)
try:
resp = conn.getresponse()
return resp.status<400
except requests.exceptions.RequestException:
return False
print(checkUrl(i))
x.append(checkUrl(i))
workbook = Workbook('abc.xlsx')
Report_Sheet = workbook.add_worksheet()
Report_Sheet.write(0, 1, 'Value')
Report_Sheet.write_column(1, 1, x)
workbook.close()
這段代碼有很多問題。
try:
的請求try:
requests.exceptions.RequestException
由於您沒有使用請求庫,而是低級別的http.client
,您應該只期待來自套接字庫的錯誤,這些錯誤都是OSError的子類
您的代碼可能會變成(小心:未經測試):
def checkUrl(i):
if 'http' not in i:
i= 'https://'+i
p = urlparse(i)
if (p.scheme == 'http'):
conn = httpc.HTTPConnection(p.netloc,timeout=4)
else:
conn = httpc.HTTPSConnection(p.netloc,timeout=4)
try:
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status<400
except OSError:
return False
根據我的經驗,當IP地址解析為有效主機名時會發生此錯誤,但服務器不再配置為使用該主機名。 這會導致服務器忽略您嘗試連接到它的嘗試。
要處理此問題,您應該在超時錯誤時返回False。
import socket
try:
resp = conn.getresponse()
return resp.status<400
except requests.exceptions.RequestException:
return False
except socket.timeout as err:
return False
您將需要檢查http.client.HTTPException
而不是requests.exceptions.RequestException
因為您正在執行的此檢查使用http.client
庫而不是requests
庫。 此外,您還需要捕獲以下所有錯誤。
import socket
import ssl
import http.client
try:
resp = conn.getresponse()
return resp.status < 400
except http.client.HTTPException as err:
# A connection was established, but the request failed
return False
except socket.timeout as err:
# The website no longer exists on the server
return False
except socket.gaierror as err:
# Could not resolve the hostname to an IP address
return False
except ssl.CertificateError as err:
# The SSL certificate was never configured, or it cannot be trusted
return False
except ssl.SSLError as err:
# Other SSL errors not covered by ssl.CertificateError
return False
首先猜測的是
resp = conn.getresponse()
應該在try子句中。 如果這不起作用,請添加程序的輸出。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.