簡體   English   中英

如何解決 requests.exceptions.ConnectionError: ('Connection aborted.') in python web 抓取?

[英]How to solve requests.exceptions.ConnectionError: ('Connection aborted.') in python web scraping?

我正在嘗試修復以下錯誤。 但我沒有找到任何解決方案。 誰能幫我這個? 當我運行此代碼時,有時它會運行代碼,但有時會顯示以下錯誤。 下面是有錯誤的代碼

import requests
from bs4 import BeautifulSoup
import mysql.connector

mydb = mysql.connector.connect(host="localhost", user="root",passwd="", database="python_db")
mycursor = mydb.cursor()
#url="https://csr.gov.in/companyprofile.php?year=FY%202014-15&CIN=U01224KA1980PLC003802"
#query1 = "INSERT INTO csr_details(average_net_profit,csr_prescribed_expenditure,csr_spent,local_area_spent) VALUES()"
mycursor.execute("SELECT cin_no FROM tn_cin WHERE csr_status=0")
urls=mycursor.fetchall()
#print(urls)

def convertTuple(tup):
   str =  ''.join(tup)
   return str
for url in urls:
    str = convertTuple(url[0])
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36', "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate"}
    csr_link = 'https://csr.gov.in/companyprofile.php?year=FY%202014-15&CIN='
    link = csr_link+str
    #print(link)
    response=requests.get(link, headers=headers) 
    #print(response.status_code)
    bs=BeautifulSoup(response.text,"html.parser")
    div_table=bs.find('div', id = 'colfy4')
    if div_table is not None:
        fy_table = div_table.find_all('table', id = 'employee_data')
        if fy_table is not None:
            for tr in fy_table:
                td=tr.find_all('td')
                if len(td)>0:
                    rows=[i.text for i in td]
                    row1=rows[0]
                    row2=rows[1]
                    row3=rows[2]
                    row4=rows[3]
                    #cin_no=url[1]
                    #cin=convertTuple(url[1])
                    #result=cin_no+rows
                    mycursor.execute("INSERT INTO csr_details(cin_no,average_net_profit,csr_prescribed_expenditure,csr_spent,local_area_spent) VALUES(%s,%s,%s,%s,%s)",(str,row1,row2,row3,row4))
                    #print(cin)
                    #print(str)
                    #var=1
                    status_update="UPDATE tn_cin SET csr_status=%s WHERE cin_no=%s"
                    data = ('1',str)
                    mycursor.execute(status_update,data)
                    #result=mycursor.fetchall()
                    #print(result)
                    mydb.commit()

運行上述代碼后出現以下錯誤

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

錯誤

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

通常是在服務器端引起的錯誤,該錯誤通常歸類為狀態碼5xx 該錯誤只是表明在交付完整響應之前關閉服務器的實例。

我相信這可能是由這條線引起的

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36', "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate"}

在某些情況下, header值存在問題。 您可以簡單地嘗試將 header 設置為

response=requests.get(link, headers={"User-Agent":"Mozilla/5.0"})

看看是否能解決你的問題。

有關各種瀏覽器的用戶代理,請參閱此答案

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM