简体   繁体   中英

Find the redirected URL with python requests library or otherwise

This URL:

http://www.yellowpages.com.sg/newiyp/UrlRedirect?applicationInd=yp&searchType=68&searchCriteria=multiple+choices&accessType=8&advertiserName=Multiple+Choices&url=62CE8F02A1BE04A51C81F85D1CE8B54DFC608A9CDA925C15EED5DA6DD90E3F7DC99CFF77216D1D1083877BA841EB97C3

Redirects to: http://www.callmyname.sg/view/Multiple+Choices/Uk9JRC9TRzA0SkstQkJDNkRFNTEuMTNCNS9FRDY5LUE4NzgtRUY=

When using requests , I get:

import requests

url = "http://www.yellowpages.com.sg/newiyp/UrlRedirect?applicationInd=yp&searchType=68&searchCriteria=multiple+choices&accessType=8&advertiserName=Multiple+Choices&url=62CE8F02A1BE04A51C81F85D1CE8B54DFC608A9CDA925C15EED5DA6DD90E3F7DC99CFF77216D1D1083877BA841EB97C3"
response = requests.get(url)
response.url

It returns the same first URL, not the redirected URL.

Here are sample. I used bit.ly because I got 403 using your url.

>>> url = "http://bit.ly/18SuUzJ"
>>> r = requests.get(url, allow_redirects=False)
>>> r.status_code
301
>>> r.headers['Location']
'http://stackoverflow.com/'

根据请求文档, r.history是您所需要的。

This site seems to require a session cookie in order for the redirect to work.

r.url does in fact show the URL after the redirect ( unless you have changed the configuration ).

The problem with your redirect is that it never happens if the cookie isn't already there. You can test that by visiting the URL with a browser in incognito/private mode. You will see an error message from http://www.yellowpages.com.sg/ with a status code 200. On a reload you will then be redirected.

Strangely, I cannot get a redirect even with a requests session. Using a real browser's user agent string doesn't seem to help, either. You might have to compare the two requests in detail to find what the crucial difference is.

The code I tried looks like this:

import requests
headers = {'User-Agent': 'user_agent',}
s = requests.Session()
url = "http://www.yellowpages.com.sg/"
r = s.get(url, headers=headers)
url = "http://www.yellowpages.com.sg/newiyp/UrlRedirect?applicationInd=yp&searchType=68&searchCriteria=multiple+choices&accessType=8&advertiserName=Multiple+Choices&url=62CE8F02A1BE04A51C81F85D1CE8B54DFC608A9CDA925C15EED5DA6DD90E3F7DC99CFF77216D1D1083877BA841EB97C3"
r = s.get(url, headers=headers)
r.url

A HEAD request could be faster than a GET request. That's even if the GET redirects are not followed. This is because HEAD returns the headers only, not the content, whereas GET returns both.

import requests

>>> response = requests.head('https://bit' + '.ly/pyre', allow_redirects=False)

>>> response.is_redirect
True

>>> response.headers['Location']
'http://www.python.org/doc/current/library/re.html'

The above approach should identify exactly one level of redirect. Also to keep it simple, I use requests.head instead of requests.Session().head .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM