简体   繁体   中英

How to bypass or catch the error of socket.timeout when checking if a website is working or not?

I have been developing a program which checks whether the website is working or not. I am fetching URLs from the excel sheet and then pasting results as True & false in the same excel sheet but for some URLs, I am getting socket.timeout error and the code isn't working after that. Here is the code:

   import http.client as httpc
from urllib.parse import urlparse
import pandas as pd
import xlwings as xw
import smtplib
from xlsxwriter import Workbook


import socket


x=[]

df = pd.read_excel (r'xyz.xlsx')
df1=pd.DataFrame(df,columns=['URL'])
print(df1)
url_list=df["URL"].tolist()
print(url_list)
for i in url_list:
    def checkUrl(i):
        if 'http' not in i:
            i= 'https://'+i
        p = urlparse(i)
        conn = httpc.HTTPConnection(p.netloc,timeout=4)
        conn.request('HEAD', p.path)
        try:
            resp = conn.getresponse()
            return resp.status<400
        except requests.exceptions.RequestException:
            return False
    print(checkUrl(i))
    x.append(checkUrl(i))


workbook = Workbook('abc.xlsx')
Report_Sheet = workbook.add_worksheet()
Report_Sheet.write(0, 1, 'Value')
Report_Sheet.write_column(1, 1, x)

workbook.close()

There are many problems in this code.

  1. you unconditionnaly use HTTP even when the url would require HTTPS
  2. you execute the request ouside of the try:
  3. the except clause expects a requests.exceptions.RequestException can cannot be thrown by your code

As you are not using the requests library, but the low level http.client , you should only expect errors from the socket library, which are all subclasses of OSError

Your code could become (beware: untested):

def checkUrl(i):
    if 'http' not in i:
        i= 'https://'+i
    p = urlparse(i)
    if (p.scheme == 'http'):
        conn = httpc.HTTPConnection(p.netloc,timeout=4)
    else:
        conn = httpc.HTTPSConnection(p.netloc,timeout=4)
    try:
        conn.request('HEAD', p.path)
        resp = conn.getresponse()
        return resp.status<400
    except OSError:
        return False

In my experience this error happens when an IP address resolves to a valid hostname, but the server is no longer configured to work with that hostname. This results in the server ignoring your attempts at trying to connect to it.

To handle this, you should return False on timeout errors.

    import socket

    try:
        resp = conn.getresponse()
        return resp.status<400
    except requests.exceptions.RequestException:
        return False
    except socket.timeout as err:
        return False

You will want to check for an http.client.HTTPException instead of a requests.exceptions.RequestException because this check that you are doing uses the http.client library and not the requests library. In addition, you will also want to catch all of the following errors.

    import socket
    import ssl
    import http.client

    try:
        resp = conn.getresponse()
        return resp.status < 400
    except http.client.HTTPException as err:
        # A connection was established, but the request failed
        return False 
    except socket.timeout as err:
        # The website no longer exists on the server
        return False
    except socket.gaierror as err:
        # Could not resolve the hostname to an IP address
        return False
    except ssl.CertificateError as err:
        # The SSL certificate was never configured, or it cannot be trusted
        return False
    except ssl.SSLError as err:
        # Other SSL errors not covered by ssl.CertificateError
        return False

First guess is that

resp = conn.getresponse()

should be inside the try clause. If that doesn't work, please add the output of the program.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM