简体   繁体   中英

How to solve requests.exceptions.ConnectionError: ('Connection aborted.') in python web scraping?

I am trying to fix the following error. But i am not finding any solution. can anyone help me with this? When i run this code sometimes it runs the code, but sometimes it displays the below error. Below is the code with the error

import requests
from bs4 import BeautifulSoup
import mysql.connector

mydb = mysql.connector.connect(host="localhost", user="root",passwd="", database="python_db")
mycursor = mydb.cursor()
#url="https://csr.gov.in/companyprofile.php?year=FY%202014-15&CIN=U01224KA1980PLC003802"
#query1 = "INSERT INTO csr_details(average_net_profit,csr_prescribed_expenditure,csr_spent,local_area_spent) VALUES()"
mycursor.execute("SELECT cin_no FROM tn_cin WHERE csr_status=0")
urls=mycursor.fetchall()
#print(urls)

def convertTuple(tup):
   str =  ''.join(tup)
   return str
for url in urls:
    str = convertTuple(url[0])
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36', "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate"}
    csr_link = 'https://csr.gov.in/companyprofile.php?year=FY%202014-15&CIN='
    link = csr_link+str
    #print(link)
    response=requests.get(link, headers=headers) 
    #print(response.status_code)
    bs=BeautifulSoup(response.text,"html.parser")
    div_table=bs.find('div', id = 'colfy4')
    if div_table is not None:
        fy_table = div_table.find_all('table', id = 'employee_data')
        if fy_table is not None:
            for tr in fy_table:
                td=tr.find_all('td')
                if len(td)>0:
                    rows=[i.text for i in td]
                    row1=rows[0]
                    row2=rows[1]
                    row3=rows[2]
                    row4=rows[3]
                    #cin_no=url[1]
                    #cin=convertTuple(url[1])
                    #result=cin_no+rows
                    mycursor.execute("INSERT INTO csr_details(cin_no,average_net_profit,csr_prescribed_expenditure,csr_spent,local_area_spent) VALUES(%s,%s,%s,%s,%s)",(str,row1,row2,row3,row4))
                    #print(cin)
                    #print(str)
                    #var=1
                    status_update="UPDATE tn_cin SET csr_status=%s WHERE cin_no=%s"
                    data = ('1',str)
                    mycursor.execute(status_update,data)
                    #result=mycursor.fetchall()
                    #print(result)
                    mydb.commit()

I am getting following error after running the above code

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

The error

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

is often an error caused on the server-side with the error normally classified under the status code of 5xx . The error simply suggests an instance in which the server is closed before a full response is delivered.

I believe it's likely caused by this line

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36', "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate"}

which in some cases has issues with the header values. You may simply try to set the header as

response=requests.get(link, headers={"User-Agent":"Mozilla/5.0"})

and see if that solves your problem.

See this answer for user-agents for a variety of browsers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM