简体   繁体   English

Python 3.4 url​​lib.request错误(http 403)

[英]Python 3.4 urllib.request error (http 403)

I'm trying to open and parse a html page. 我正在尝试打开并解析一个html页面。 In python 2.7.8 I have no problem: 在python 2.7.8中我没有问题:

import urllib
url = "https://ipdb.at/ip/66.196.116.112"
html = urllib.urlopen(url).read()

and everything is fine. 一切都很好。 However I want to move to python 3.4 and there I get HTTP error 403 (Forbidden). 但是我想转移到python 3.4并在那里得到HTTP错误403(禁止)。 My code: 我的代码:

import urllib.request
html = urllib.request.urlopen(url) # same URL as before

File "C:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 461, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 499, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

It work for other URLs which don't use https. 它适用于不使用https的其他URL。

url = 'http://www.stopforumspam.com/ipcheck/212.91.188.166'

is ok. 没关系。

It seems like the site does not like the user agent of Python 3.x. 看起来该网站不喜欢Python 3.x的用户代理。

Specifying User-Agent will solve your problem: 指定User-Agent将解决您的问题:

import urllib.request
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()

NOTE Python 2.x urllib version also receives 403 status, but unlike Python 2.x urllib2 and Python 3.x urllib, it does not raise the exception. 注意 Python 2.x urllib版本也会收到403状态,但与Python 2.x urllib2和Python 3.x urllib不同,它不会引发异常。

You can confirm that by following code: 您可以通过以下代码确认:

print(urllib.urlopen(url).getcode())  # => 403

Here are some notes I gathered on urllib when I was studying python-3: 以下是我在学习python-3时在urllib收集的一些注释:
I kept them in case they might come in handy or help someone else out. 我保留了它们,以防它们派上用场或者帮助别人。

How to import urllib.request and urllib.parse : 如何导入urllib.requesturllib.parse

import urllib.request as urlRequest
import urllib.parse as urlParse

How to make a GET request: 如何提出GET请求:

url = "http://www.example.net"
# open the url
x = urlRequest.urlopen(url)
# get the source code
sourceCode = x.read()

How to make a POST request: 如何发出POST请求:

url = "https://www.example.com"
values = {"q": "python if"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url, values)
# open the url
x  = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()

How to make a POST request ( 403 forbidden responses): 如何发出POST请求( 403 forbidden响应):

url = "https://www.example.com"
values = {"q": "python urllib"}
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url = url, data = values, headers = headers)
# open the url
x  = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()

How to make a GET request ( 403 forbidden responses): 如何发出GET请求( 403 forbidden回复):

url = "https://www.example.com"
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
req = urlRequest.Request(url, headers = headers)
# open the url
x = urlRequest.urlopen(req)
# get the source code
sourceCode = x.read()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM