简体   繁体   English

Python 请求获取 ('Connection aborted.', BadStatusLine("''",)) 错误

[英]Python Requests getting ('Connection aborted.', BadStatusLine("''",)) error

def download_torrent(url):
    fname = os.getcwd() + '/' + url.split('title=')[-1] + '.torrent'
    try:
        schema = ('http:')
        r = requests.get(schema + url, stream=True)
        with open(fname, 'wb') as f:
            for chunk in r.iter_content(chunk_size=1024):
                if chunk:
                    f.write(chunk)
                    f.flush()
    except requests.exceptions.RequestException as e:
        print('\n' + OutColors.LR + str(e))
        sys.exit(1)

    return fname

In that block of code I am getting an error when I run the full script.在该代码块中,当我运行完整脚本时出现错误。 When I go to actually download the torrent, I get:当我去实际下载 torrent 时,我得到:

('Connection aborted.', BadStatusLine("''",))

I only posted the block of code that I think is relevant above.我只发布了我认为与上面相关的代码块。 The entire script is below.整个脚本如下。 It's from pantuts, but I don't think it's maintained any longer, and I am trying to get it running with python3.它来自pantuts,但我认为它不再维护了,我正试图让它与python3一起运行。 From my research, the error might mean I'm using http instead of https, but I have tried both.根据我的研究,该错误可能意味着我使用的是 http 而不是 https,但我两者都尝试过。

Original script原始脚本

The error you get indicates the host isn't responding in the expected manner.您收到的错误表明主机没有以预期的方式响应。 In this case, it's because it detects that you're trying to scrape it and deliberately disconnecting you .在这种情况下,这是因为它检测到您正在尝试抓取它并故意断开与您的连接

If you try your requests code with this URL from a test website: http://mirror.internode.on.net/pub/test/5meg.test1 , you'll see that it downloads normally.如果您使用来自测试网站的此 URL 尝试您的requests代码: http://mirror.internode.on.net/pub/test/5meg.test1 : http://mirror.internode.on.net/pub/test/5meg.test1 ,您将看到它正常下载。

To get around this, fake your user agent .为了解决这个问题,请伪造您的用户代理 Your user agent identifies your web browser, and web hosts commonly check it to detect bots.您的用户代理会识别您的 Web 浏览器,并且 Web 主机通常会检查它以检测机器人。

Use the headers field to set your user agent.使用headers字段来设置您的用户代理。 Here's an example which tells the webhost you're Firefox.这是一个示例,它告诉网络主机您是 Firefox。

headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0' }
r = requests.get(url, headers=headers)

There are lots of other discrepancies 1 between bots and human-operated browsers that web hosts can check for, but user agent is one of the easiest and common ones.网络主机可以检查机器人和人工操作的浏览器之间还有许多其他差异1 ,但用户代理是最简单和常见的差异之一。

If you want your scraper to be harder to detect, you'll want to use a headless browser like headless Chrome 2 (or ghost.py if you want to stick with Python), which you can trust will behave like a real browser (because it is!).如果你想让你的爬虫更难被检测到,你会想要使用无头浏览器,比如无头 Chrome 2 (如果你想坚持使用 Python,或者ghost.py ),你可以相信它会像真正的浏览器一样运行(因为这是!)。


Footnotes:脚注:

1 Possible other checks include checks for if images aren't being downloaded, page resources aren't downloaded in the normal order, pages being downloaded faster than a human can read them, and cookies not being set properly. 1可能的其他检查包括检查图像是否未下载、页面资源未按正常顺序下载、页面下载速度比人类阅读速度快,以及 cookie 设置不正确。 Google flags mouse movements deemed insufficiently human-like.谷歌标记了被认为不够像人类的鼠标移动。

2 Headless Chrome is the most competent headless browser in 2018, but if its weight is a problem for you, its slightly-outdated predecessors, PhantomJS and ghost.py , are lighter weight and still usable. 2 Headless Chrome 是 2018 年最称职的无头浏览器,但如果它的重量对你来说是个问题,它的稍微过时的前辈PhantomJSghost.py重量更轻,仍然可用。

try this:尝试这个:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0',
    'ACCEPT' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'ACCEPT-ENCODING' : 'gzip, deflate, br',
    'ACCEPT-LANGUAGE' : 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7',
    'REFERER' : 'https://www.google.com/'
}

    r = requests.get("http://yourdomain.com/", headers=headers)

In my case, i must remove the user agent fields from headers就我而言,我必须从标题中删除用户代理字段

url='https://...'
headers = {}
requests.get(url, headers=headers)

once i set 'User-Agent' , it getting ('Connection aborted.', BadStatusLine("''",)) and this error occurs only with the individual site.一旦我设置了'User-Agent' ,它就会得到('Connection aborted.', BadStatusLine("''",))并且这个错误只发生在单个站点上。 my first post,i get many helps from this site, hope it can help others who find here我的第一篇文章,我从这个网站得到了很多帮助,希望它可以帮助其他在这里找到的人

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python-请求lib-错误(“连接已中止。”,BadStatusLine(“''”,)) - Python - requests lib - error ('Connection aborted.', BadStatusLine(“''”,)) Python Spider ConnectionError :(“连接已中止。”,BadStatusLine(“''”,)) - Python Spider ConnectionError: ('Connection aborted.', BadStatusLine(“''”,)) 使用 python 请求获取 ('Connection aborted.', OSError(0, 'Error') 错误 - Getting ('Connection aborted.', OSError(0, 'Error') errors with python requests 连接已中止。',服务器上的BadStatusLine(“''”,)? - Connection aborted.', BadStatusLine(“''”,) on server? 当xml大小增加时,Python请求XML API异常ConnectionError :(“连接已中止。”,BadStatusLine(“''”,)) - Python Requests XML API Exception ConnectionError: ('Connection aborted.', BadStatusLine(“''”,)) when xml size increases Python请求'连接中止。' 如果它将以cronjob开始 - Python requests 'Connection aborted.' if it will started with a cronjob 从Zendesk API获取数据时,为什么HTTP协议状态为PropertyError('Connection aborted。',BadStatusLine(“'',))处于错误状态? - Why bad HTTP status with ProtocolError('Connection aborted.', BadStatusLine(“''”,)) when getting data from Zendesk API? Python requests.exception.ConnectionError:连接中止“BadStatusLine” - Python requests.exception.ConnectionError: connection aborted “BadStatusLine” 尝试发布到gfycat.com时,Python请求提供错误信息(“连接已中止。”,错误(“(104,'ECONNRESET')”,))) - Python requests giving error ('Connection aborted.', error(“(104, 'ECONNRESET')”,)) when attempting to post to gfycat.com python - requests.get(连接中止。',OSError(“(60,'ETIMEDOUT') - python - requests.get (Connection aborted.', OSError("(60, 'ETIMEDOUT')
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM