简体   繁体   中英

Python urllib2 problems

I have a very basic script to download a website using Python urllib2.

This has been working brilliantly for the past 6 months, and then this morning it no longer works?

#!/usr/bin/python
import urllib2
proxy_support = urllib2.ProxyHandler({'http': 'http://DOMAIN\USER:PASS@PROXY:PORT/'})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
translink = open('/tmp/trains.html' ,'w')
response = urllib2.urlopen('http://translink.com.au')
html = response.read()
translink.write(html)
translink.close()

I am now getting the following error

Traceback (most recent call last):
  File "./gettrains.py", line 7, in <module>
    response = urllib2.urlopen('http://translink.com.au')
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 407, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 520, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 445, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 528, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 502: Proxy Error ( The HTTP message includes an unsupported header or an unsupported combination of headers.  )

I am new to Python, any help would be very much appreciated.

Cheers

#!/usr/bin/python
import requests
proxies = {
"http": "http://domain\user:pass@proxy:port",
"https": "http:// domain\user:pass@proxy:port",
} 
html = requests.get("http://translink.com.au", proxies=proxies)
translink = open('/tmp/trains.html' ,'w')
translink.write(html.content)
translink.close()

Try to change a header. For example:

opener = urllib2.build_opener(proxy_support)
opener.addheaders = ([('User-Agent' , 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)')])
urllib2.install_opener(opener)

I had same problem some days ago. My proxy didn't admit the default header user-agent='Python-urllib/2.7'

To simplify things a little bit, I would avoid the proxy setup from within python and simply let your OS manage it for you. You can do this by setting an environment variable (like export http_proxy="your_proxy" in Linux). Then simply grab the file directly through python, which you can do with urllib2 or requests , you may also consider the wget module.

It's totally possible that there may have been some changes to your proxy that forwards the requests with headers that are no longer acceptable by your final destination. In that case there's very little you can do.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM