简体   繁体   中英

HTTP Error 403: Forbidden urlib2 Python 2.7

I've been successfully been able to use urllib2 but for this website I was testing all of a sudden it didn't work. I've looked on the forum and tried some of the fixes and it doesn't seem to work. Below is an example of one way it was solved but isn't working for me. Can someone help me be able to connect to it.

The code that gives the error:

from bs4 import BeautifulSoup
import urllib2

proxy_support = urllib2.ProxyHandler({"http":"http://username:password@ip:port"})
hdr = {'Accept': 'text/html,application/xhtml+xml,*/*'}
url = 'http://www.carnextdoor.com.au/'
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
req=urllib2.Request(url,headers=hdr)
#Here I get the error with and without using the header or going html = urllib2.urlopen(url).read()
html = urllib2.urlopen(req).read()
soup=BeautifulSoup(html,"html5lib")
print soup

I got a 403 until I added a user-agent, the following was enough to work for me:

hdr = {'Accept': 'text/html,application/xhtml+xml,*/*',"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"}
url = 'http://www.carnextdoor.com.au/'


req=urllib2.Request(url,headers=hdr)
#Here I get the error with and without using the header or going html = urllib2.urlopen(url).read()
html = urllib2.urlopen(req).read()
soup=BeautifulSoup(html,"html5lib")
print soup

Without user-agent:

In [10]: hdr = {'Accept': 'text/html,application/xhtml+xml,*/*'}

In [11]: url = 'http://www.carnextdoor.com.au/'

In [12]: req=urllib2.Request(url,headers=hdr)

In [13]: html = urllib2.urlopen(req).read()
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-13-dbeb64d95cd3> in <module>()
----> 1 html = urllib2.urlopen(req).read()

With user-agent:

In [20]: hdr = {'Accept': 'text/html,application/xhtml+xml,*/*',"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"}

In [21]: req=urllib2.Request(url,headers=hdr)
In [22]: html = urllib2.urlopen(req).read()
In [23]: 

Using requests without any user-agent also works fine.

In [28]: import requests

In [29]: r = requests.get(url)

In [30]: r.status_code
Out[30]: 200

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM