This URL:
Redirects to: http://www.callmyname.sg/view/Multiple+Choices/Uk9JRC9TRzA0SkstQkJDNkRFNTEuMTNCNS9FRDY5LUE4NzgtRUY=
When using requests
, I get:
import requests
url = "http://www.yellowpages.com.sg/newiyp/UrlRedirect?applicationInd=yp&searchType=68&searchCriteria=multiple+choices&accessType=8&advertiserName=Multiple+Choices&url=62CE8F02A1BE04A51C81F85D1CE8B54DFC608A9CDA925C15EED5DA6DD90E3F7DC99CFF77216D1D1083877BA841EB97C3"
response = requests.get(url)
response.url
It returns the same first URL, not the redirected URL.
Here are sample. I used bit.ly because I got 403 using your url.
>>> url = "http://bit.ly/18SuUzJ"
>>> r = requests.get(url, allow_redirects=False)
>>> r.status_code
301
>>> r.headers['Location']
'http://stackoverflow.com/'
根据请求文档, r.history
是您所需要的。
This site seems to require a session cookie in order for the redirect to work.
r.url
does in fact show the URL after the redirect ( unless you have changed the configuration ).
The problem with your redirect is that it never happens if the cookie isn't already there. You can test that by visiting the URL with a browser in incognito/private mode. You will see an error message from http://www.yellowpages.com.sg/ with a status code 200. On a reload you will then be redirected.
Strangely, I cannot get a redirect even with a requests
session. Using a real browser's user agent string doesn't seem to help, either. You might have to compare the two requests in detail to find what the crucial difference is.
The code I tried looks like this:
import requests
headers = {'User-Agent': 'user_agent',}
s = requests.Session()
url = "http://www.yellowpages.com.sg/"
r = s.get(url, headers=headers)
url = "http://www.yellowpages.com.sg/newiyp/UrlRedirect?applicationInd=yp&searchType=68&searchCriteria=multiple+choices&accessType=8&advertiserName=Multiple+Choices&url=62CE8F02A1BE04A51C81F85D1CE8B54DFC608A9CDA925C15EED5DA6DD90E3F7DC99CFF77216D1D1083877BA841EB97C3"
r = s.get(url, headers=headers)
r.url
A HEAD
request could be faster than a GET
request. That's even if the GET
redirects are not followed. This is because HEAD
returns the headers only, not the content, whereas GET
returns both.
import requests
>>> response = requests.head('https://bit' + '.ly/pyre', allow_redirects=False)
>>> response.is_redirect
True
>>> response.headers['Location']
'http://www.python.org/doc/current/library/re.html'
The above approach should identify exactly one level of redirect. Also to keep it simple, I use requests.head
instead of requests.Session().head
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.