HTTP Error 403: Forbidden urlib2 Python 2.7

Question

I've been successfully been able to use urllib2 but for this website I was testing all of a sudden it didn't work. I've looked on the forum and tried some of the fixes and it doesn't seem to work. Below is an example of one way it was solved but isn't working for me. Can someone help me be able to connect to it.

The code that gives the error:

from bs4 import BeautifulSoup
import urllib2

proxy_support = urllib2.ProxyHandler({"http":"http://username:password@ip:port"})
hdr = {'Accept': 'text/html,application/xhtml+xml,*/*'}
url = 'http://www.carnextdoor.com.au/'
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
req=urllib2.Request(url,headers=hdr)
#Here I get the error with and without using the header or going html = urllib2.urlopen(url).read()
html = urllib2.urlopen(req).read()
soup=BeautifulSoup(html,"html5lib")
print soup

Answer 1

I got a 403 until I added a user-agent, the following was enough to work for me:

hdr = {'Accept': 'text/html,application/xhtml+xml,*/*',"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"}
url = 'http://www.carnextdoor.com.au/'


req=urllib2.Request(url,headers=hdr)
#Here I get the error with and without using the header or going html = urllib2.urlopen(url).read()
html = urllib2.urlopen(req).read()
soup=BeautifulSoup(html,"html5lib")
print soup

Without user-agent:

In [10]: hdr = {'Accept': 'text/html,application/xhtml+xml,*/*'}

In [11]: url = 'http://www.carnextdoor.com.au/'

In [12]: req=urllib2.Request(url,headers=hdr)

In [13]: html = urllib2.urlopen(req).read()
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-13-dbeb64d95cd3> in <module>()
----> 1 html = urllib2.urlopen(req).read()

With user-agent:

In [20]: hdr = {'Accept': 'text/html,application/xhtml+xml,*/*',"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"}

In [21]: req=urllib2.Request(url,headers=hdr)
In [22]: html = urllib2.urlopen(req).read()
In [23]:

Using requests without any user-agent also works fine.

In [28]: import requests

In [29]: r = requests.get(url)

In [30]: r.status_code
Out[30]: 200

HTTP Error 403: Forbidden urlib2 Python 2.7

Question

1 answers

solution1
0 ACCPTED 2016-03-07 00:36:36

HTTP Error 403: Forbidden urlib2 Python 2.7

Question

1 answers

solution1 0 ACCPTED 2016-03-07 00:36:36

solution1
0 ACCPTED 2016-03-07 00:36:36