I have been developing a program which checks whether the website is working or not. I am fetching URLs from the excel sheet and then pasting results as True & false in the same excel sheet but for some URLs, I am getting socket.timeout error and the code isn't working after that. Here is the code:
import http.client as httpc
from urllib.parse import urlparse
import pandas as pd
import xlwings as xw
import smtplib
from xlsxwriter import Workbook
import socket
x=[]
df = pd.read_excel (r'xyz.xlsx')
df1=pd.DataFrame(df,columns=['URL'])
print(df1)
url_list=df["URL"].tolist()
print(url_list)
for i in url_list:
def checkUrl(i):
if 'http' not in i:
i= 'https://'+i
p = urlparse(i)
conn = httpc.HTTPConnection(p.netloc,timeout=4)
conn.request('HEAD', p.path)
try:
resp = conn.getresponse()
return resp.status<400
except requests.exceptions.RequestException:
return False
print(checkUrl(i))
x.append(checkUrl(i))
workbook = Workbook('abc.xlsx')
Report_Sheet = workbook.add_worksheet()
Report_Sheet.write(0, 1, 'Value')
Report_Sheet.write_column(1, 1, x)
workbook.close()
There are many problems in this code.
try:
requests.exceptions.RequestException
can cannot be thrown by your code As you are not using the requests library, but the low level http.client
, you should only expect errors from the socket library, which are all subclasses of OSError
Your code could become (beware: untested):
def checkUrl(i):
if 'http' not in i:
i= 'https://'+i
p = urlparse(i)
if (p.scheme == 'http'):
conn = httpc.HTTPConnection(p.netloc,timeout=4)
else:
conn = httpc.HTTPSConnection(p.netloc,timeout=4)
try:
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status<400
except OSError:
return False
In my experience this error happens when an IP address resolves to a valid hostname, but the server is no longer configured to work with that hostname. This results in the server ignoring your attempts at trying to connect to it.
To handle this, you should return False on timeout errors.
import socket
try:
resp = conn.getresponse()
return resp.status<400
except requests.exceptions.RequestException:
return False
except socket.timeout as err:
return False
You will want to check for an http.client.HTTPException
instead of a requests.exceptions.RequestException
because this check that you are doing uses the http.client
library and not the requests
library. In addition, you will also want to catch all of the following errors.
import socket
import ssl
import http.client
try:
resp = conn.getresponse()
return resp.status < 400
except http.client.HTTPException as err:
# A connection was established, but the request failed
return False
except socket.timeout as err:
# The website no longer exists on the server
return False
except socket.gaierror as err:
# Could not resolve the hostname to an IP address
return False
except ssl.CertificateError as err:
# The SSL certificate was never configured, or it cannot be trusted
return False
except ssl.SSLError as err:
# Other SSL errors not covered by ssl.CertificateError
return False
First guess is that
resp = conn.getresponse()
should be inside the try clause. If that doesn't work, please add the output of the program.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.