I'm using the python requests library (version 2.4.1) for performing a simple get request, code is below, nothing fancy here. On most website's there are no issues. But on some websites, one in particular www.pricegrabber.com, I experience 100% CPU usage and the code never moves past the point of the get request. No timeout occurs, nothing, just a huge CPU usage spike that never stops.
import requests
url = 'http://www.pricegrabber.com'
r = requests.get(url, timeout=(1, 1))
print 'SUCESS'
print r
Using python 2.7, the latest stable version of the 'requests' library, and enabling logging as shown in this answer indicates that the HTTP request is stuck in a redirect loop.
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): www.pricegrabber.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 301 20
DEBUG:requests.packages.urllib3.connectionpool:"GET /index.php/ut=43bb2597a77557f5 HTTP/1.1" 301 20
DEBUG:requests.packages.urllib3.connectionpool:"GET /?ut=43bb2597a77557f5 HTTP/1.1" 301 20
DEBUG:requests.packages.urllib3.connectionpool:"GET /?ut=43bb2597a77557f5 HTTP/1.1" 301 20
DEBUG:requests.packages.urllib3.connectionpool:"GET /?ut=43bb2597a77557f5 HTTP/1.1" 301 20...
This continues a bit until:
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
And the code I used to discover this:
#!/usr/bin/env python
import logging
import requests
logging.basicConfig(level=logging.DEBUG)
url = 'http://www.pricegrabber.com'
r = requests.get(url, timeout=(1, 1))
print 'SUCCESS'
print r
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.