简体   繁体   English

python请求模块重定向过多的问题

[英]Issue with too many redirects with python requests module

I am trying to unshorten a list of roughly 150,000 t.co links and my code works for the most part, however, I have a bunch of t.co links that all redirect here , and for some reason requests is getting too many redirects. 我正在尝试缩短大约150,000个t.co链接的列表,并且我的代码大部分都可以正常工作,但是,我有一堆t.co链接都重定向到了这里 ,由于某种原因,请求获得了太多的重定向。

def expand_url(url):
  s = requests.Session()
  try:
     r = s.head(url.rstrip(), allow_redirects=True,verify=False)
     return r.url.rstrip()
  except requests.exceptions.ConnectionError as e:
    print(e)

I tried using the line s.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36' as suggested in another thread. 我尝试使用s.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36'另一个线程。 I also tried increasing the max re-directs and that didn't really help. 我也尝试增加最大重定向次数,但这并没有真正的帮助。

Here are some of the t.co links that are causing the issue: 以下是一些导致问题的t.co链接:

https://t dot co/5FXvHY1Rbx https:// t点co / 5FXvHY1Rbx

https://t dot co/L3Ytnz2916 https:// t dot co / L3Ytnz2916

Any suggestions on what to do? 有什么建议吗?

Thanks 谢谢

Set the max redirects times that you can bear. 设置您可以承受的最大重定向时间。

http://docs.python-requests.org/en/master/api/#requests.Session.max_redirects http://docs.python-requests.org/en/master/api/#requests.Session.max_redirects

s = requests.Session()
s.max_redirects = 3

the reason why you fall into deadloop because WH did not support head method, it keeps sending you 302 Moved Temporarily . 由于WH不支持head方法而陷入死循环的原因,它会不断向您发送302 Moved Temporarily But actually you have redirected finished (from short url to WH). 但实际上您已将完成重定向(从短网址重定向到WH)。 Try to use r.history to see all response 尝试使用r.history查看所有回复

import requests

def expand_url(url):
  s = requests.Session()
  #s.allow_redirects = -1
  try:
     r = s.get(url.rstrip(),allow_redirects=3,verify=False)
     print([resp.url for resp in r.history])
     return r.url.rstrip()
  except requests.exceptions.ConnectionError as e:
    print(e)

print(expand_url("https://t<dot>co/5FXvHY1Rbx"))

Also you can write your own max_redirects. 您也可以编写自己的max_redirects。

import requests

def expand_url(url,times):
    s = requests.Session()
    times -= 1
    if not times:
        return url
    try:
        r = s.head(url.rstrip(),verify=False)
        location = r.headers.get("location").rstrip()
        if url.find(location) > 0:
            # in case redirect to same page
            return url 
        next_step = expand_url(location,times) if location else url
        return next_step
    except requests.exceptions.ConnectionError as e:
        print(e)

print(expand_url("https://t<dot>co/5FXvHY1Rbx",4))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM