Python 3.4 urllib.request錯誤（http 403）

Question

我正在嘗試打開並解析一個html頁面。 在python 2.7.8中我沒有問題：

import urllib
url = "https://ipdb.at/ip/66.196.116.112"
html = urllib.urlopen(url).read()

一切都很好。 但是我想轉移到python 3.4並在那里得到HTTP錯誤403（禁止）。 我的代碼：

import urllib.request
html = urllib.request.urlopen(url) # same URL as before

File "C:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 461, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 499, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

它適用於不使用https的其他URL。

url = 'http://www.stopforumspam.com/ipcheck/212.91.188.166'

沒關系。

Answer 1

看起來該網站不喜歡Python 3.x的用戶代理。

指定User-Agent將解決您的問題：

import urllib.request
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()

注意 Python 2.x urllib版本也會收到403狀態，但與Python 2.x urllib2和Python 3.x urllib不同，它不會引發異常。

您可以通過以下代碼確認：

print(urllib.urlopen(url).getcode())  # => 403

Answer 2

以下是我在學習python-3時在urllib收集的一些注釋：
我保留了它們，以防它們派上用場或者幫助別人。

如何導入`urllib.request`和`urllib.parse` ：

import urllib.request as urlRequest
import urllib.parse as urlParse

如何提出GET請求：

url = "http://www.example.net"
# open the url
x = urlRequest.urlopen(url)
# get the source code
sourceCode = x.read()

如何發出POST請求：

url = "https://www.example.com"
values = {"q": "python if"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url, values)
# open the url
x  = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()

如何發出POST請求（ `403 forbidden`響應）：

url = "https://www.example.com"
values = {"q": "python urllib"}
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url = url, data = values, headers = headers)
# open the url
x  = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()

如何發出GET請求（ `403 forbidden`回復）：

url = "https://www.example.com"
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
req = urlRequest.Request(url, headers = headers)
# open the url
x = urlRequest.urlopen(req)
# get the source code
sourceCode = x.read()

Python 3.4 urllib.request錯誤（http 403）

問題描述

2 個解決方案

解決方案1
29 已采納 2015-02-08 16:34:29

解決方案2
2

如何導入`urllib.request`和`urllib.parse` ：

如何提出GET請求：

如何發出POST請求：

如何發出POST請求（ `403 forbidden`響應）：

如何發出GET請求（ `403 forbidden`回復）：

Python 3.4 url​​lib.request錯誤（http 403）

問題描述

2 個解決方案

解決方案1 29 已采納 2015-02-08 16:34:29

解決方案2 2

如何導入urllib.request和urllib.parse ：

如何提出GET請求：

如何發出POST請求：

如何發出POST請求（ 403 forbidden響應）：

如何發出GET請求（ 403 forbidden回復）：

Python 3.4 urllib.request錯誤（http 403）

解決方案1
29 已采納 2015-02-08 16:34:29

解決方案2
2

如何導入`urllib.request`和`urllib.parse` ：

如何發出POST請求（ `403 forbidden`響應）：

如何發出GET請求（ `403 forbidden`回復）：