简体   繁体   English

检查网站是否正常工作时如何绕过或捕获socket.timeout的错误?

[英]How to bypass or catch the error of socket.timeout when checking if a website is working or not?

I have been developing a program which checks whether the website is working or not. 我一直在开发一个程序来检查网站是否正常工作。 I am fetching URLs from the excel sheet and then pasting results as True & false in the same excel sheet but for some URLs, I am getting socket.timeout error and the code isn't working after that. 我从Excel工作表中获取URL,然后在同一个Excel工作表中将结果粘贴为True和false但是对于某些URL,我收到socket.timeout错误,之后代码无效。 Here is the code: 这是代码:

   import http.client as httpc
from urllib.parse import urlparse
import pandas as pd
import xlwings as xw
import smtplib
from xlsxwriter import Workbook


import socket


x=[]

df = pd.read_excel (r'xyz.xlsx')
df1=pd.DataFrame(df,columns=['URL'])
print(df1)
url_list=df["URL"].tolist()
print(url_list)
for i in url_list:
    def checkUrl(i):
        if 'http' not in i:
            i= 'https://'+i
        p = urlparse(i)
        conn = httpc.HTTPConnection(p.netloc,timeout=4)
        conn.request('HEAD', p.path)
        try:
            resp = conn.getresponse()
            return resp.status<400
        except requests.exceptions.RequestException:
            return False
    print(checkUrl(i))
    x.append(checkUrl(i))


workbook = Workbook('abc.xlsx')
Report_Sheet = workbook.add_worksheet()
Report_Sheet.write(0, 1, 'Value')
Report_Sheet.write_column(1, 1, x)

workbook.close()

There are many problems in this code. 这段代码有很多问题。

  1. you unconditionnaly use HTTP even when the url would require HTTPS 即使url需要HTTPS,你也无法使用HTTP
  2. you execute the request ouside of the try: 你执行try:的请求try:
  3. the except clause expects a requests.exceptions.RequestException can cannot be thrown by your code except子句期望您的代码不能抛出requests.exceptions.RequestException

As you are not using the requests library, but the low level http.client , you should only expect errors from the socket library, which are all subclasses of OSError 由于您没有使用请求库,而是低级别的http.client ,您应该只期待来自套接字库的错误,这些错误都是OSError的子类

Your code could become (beware: untested): 您的代码可能会变成(小心:未经测试):

def checkUrl(i):
    if 'http' not in i:
        i= 'https://'+i
    p = urlparse(i)
    if (p.scheme == 'http'):
        conn = httpc.HTTPConnection(p.netloc,timeout=4)
    else:
        conn = httpc.HTTPSConnection(p.netloc,timeout=4)
    try:
        conn.request('HEAD', p.path)
        resp = conn.getresponse()
        return resp.status<400
    except OSError:
        return False

In my experience this error happens when an IP address resolves to a valid hostname, but the server is no longer configured to work with that hostname. 根据我的经验,当IP地址解析为有效主机名时会发生此错误,但服务器不再配置为使用该主机名。 This results in the server ignoring your attempts at trying to connect to it. 这会导致服务器忽略您尝试连接到它的尝试。

To handle this, you should return False on timeout errors. 要处理此问题,您应该在超时错误时返回False。

    import socket

    try:
        resp = conn.getresponse()
        return resp.status<400
    except requests.exceptions.RequestException:
        return False
    except socket.timeout as err:
        return False

You will want to check for an http.client.HTTPException instead of a requests.exceptions.RequestException because this check that you are doing uses the http.client library and not the requests library. 您将需要检查http.client.HTTPException而不是requests.exceptions.RequestException因为您正在执行的此检查使用http.client库而不是requests库。 In addition, you will also want to catch all of the following errors. 此外,您还需要捕获以下所有错误。

    import socket
    import ssl
    import http.client

    try:
        resp = conn.getresponse()
        return resp.status < 400
    except http.client.HTTPException as err:
        # A connection was established, but the request failed
        return False 
    except socket.timeout as err:
        # The website no longer exists on the server
        return False
    except socket.gaierror as err:
        # Could not resolve the hostname to an IP address
        return False
    except ssl.CertificateError as err:
        # The SSL certificate was never configured, or it cannot be trusted
        return False
    except ssl.SSLError as err:
        # Other SSL errors not covered by ssl.CertificateError
        return False

First guess is that 首先猜测的是

resp = conn.getresponse()

should be inside the try clause. 应该在try子句中。 If that doesn't work, please add the output of the program. 如果这不起作用,请添加程序的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM