簡體   English   中英

HTTP 錯誤 403 in Python 3 web 抓取出版物

[英]HTTP error 403 in Python 3 web scraping the publications

這是我嘗試放置出版物的 URL 時發生的錯誤的回溯。 它適用於 Stack Overflow 或 Wikipedia 等常規網站,但當我在https://www.sciencedirect.com/science/article/pii/S1388248120302113?via%3Dihub等出版物上嘗試時,出現錯誤。

這是我的代碼:

req = Request(' https://www.sciencedirect.com/science/article/pii/S1388248120302113?via%3Dihub', headers={'User-Agent': 'Mozilla/5.0'})
html_plain = urlopen(req).read()

這是錯誤的回溯:

 File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 523, in open
    response = meth(req, response)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 632, in http_response       
    response = self.parent.error(
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 561, in error
    return self._call_chain(*args)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 641, in http_error_default  
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

HTTP 403 Forbidden 客戶端錯誤狀態響應代碼表示服務器理解請求但拒絕授權。 這不是代碼中的錯誤,這是網站在您進行網絡抓取時拒絕為該頁面提供服務。

try:
    # If you're getting a 403 response, use this. ("HTTP error occurred: 403 Client Error: Forbidden")
    user_agent = 'Mozilla/5.0'
    response = requests.get(url, headers={'User-Agent': user_agent})
    # If the response was successful, no Exception will be raised
    response.raise_for_status()
except HTTPError as http_err:
    print(f'HTTP error occurred: {http_err}')  # Python 3.6
except Exception as err:
    print(f'Other error occurred: {err}')  # Python 3.6
else:

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM