简体   繁体   English

通过请求的max_retries设置获取status_code

[英]Get status_code with max_retries setting for requests.head

As seen here , max-retries can be set for requests.Session() , but I only need the head.status_code to check if a url is valid and active. 正如所看到这里max-retries可以设置requests.Session()但我只需要head.status_code检查URL是有效的和积极的。

Is there a way to just get the head within a mount session? 有没有办法让您在挂载会话中处于领先地位?

import requests
def valid_active_url(url):
    try:
        site_ping = requests.head(url, allow_redirects=True)
    except requests.exceptions.ConnectionError:
        print('Error trying to connect to {}.'.format(url))

    try:
        if (site_ping.status_code < 400):
            return True
        else:
            return False
    except Exception:
        return False
    return False

Based on docs am thinking I need to either: 基于文档,我认为我需要:

  • see if the session.mount method results return a status code (which I haven't found yet) 查看session.mount方法的结果是否返回状态码(我尚未找到)
  • roll my own retry method, perhaps with a decorator like this or this or a (less eloquent) loop like this . 推出自己的重试方法,也许像一个装饰这个或类似(较少雄辩)循环

In terms of the first approach I have tried: 在第一种方法中,我尝试过:

s = requests.Session()
a = requests.adapters.HTTPAdapter(max_retries=3)
s.mount('http://redirected-domain.com', a)
resp = s.get('http://www.redirected-domain.org')
resp.status_code

Are we only using s.mount() to get in and set max_retries ? 我们仅使用s.mount()进入并设置max_retries吗? Seems to be a redundancy, aside from that the http connection would be pre-established. 除了可以预先建立http连接外,这似乎是一种冗余。

Also resp.status_code returns 200 where I am expecting a 301 (which is what requests.head returns. 另外resp.status_code返回200 ,我期望的是301requests.head返回的内容)。

NOTE: resp.ok might be all I need for my purposes here. 注意: resp.ok可能就是我在这里需要的。

After a mere two hours of developing the question, the answer took five minutes: 在仅花了两个小时提出问题后,答案花了五分钟:

def valid_url(url):
    if (url.lower() == 'none') or (url == ''):
        return False
    try:
        s = requests.Session()
        a = requests.adapters.HTTPAdapter(max_retries=5)
        s.mount(url, a)
        resp = s.head(url)
        return resp.ok
    except requests.exceptions.MissingSchema:
        # If it's missing the schema, run again with schema added
        return valid_url('http://' + url)
    except requests.exceptions.ConnectionError:
        print('Error trying to connect to {}.'.format(url))
        return False

Based on this answer it looks like the head request will be slightly less resource intensive than the get, particularly if the url contains a large amount of data. 根据这个答案head请求看起来将比获取请求少一些资源,特别是在url包含大量数据的情况下。

The requests.adapters.HTTPAdapter is the built in adaptor for the urllib3 library that underlies the Requests library. requests.adapters.HTTPAdapter是urllib3库的内置适配器,该库是Requests库的基础。

On another note, I'm not sure what the correct term or phrase for what I'm checking here is. 另一方面,我不确定此处要检查的正确术语或短语是什么。 A url could still be valid if it returns an error code. 如果网址返回错误代码,则该网址可能仍然有效

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM