如何在Python 3中检索带有User-Agent标头的文件？

Question

I'm trying to write a (simple) piece of code to download files off the internet. 我正在尝试编写一段（简单的）代码以从Internet下载文件。 The problem is, some of these files are on websites that block the default python User-Agent headers. 问题是，其中一些文件位于阻止默认python User-Agent标头的网站上。 For example: 例如：

import urllib.request as html
html.urlretrieve('http://stackoverflow.com', 'index.html')

returns 回报

urllib.error.HTTPError: HTTP Error 403: Forbidden`

Normally, I would set the headers in the request, such as: 通常，我会在请求中设置标头，例如：

import urllib.request as html
request = html.Request('http://stackoverflow.com', headers={"User-Agent":"Firefox"})
response = html.urlopen(request)

however, as urlretrieve doesn't work with requests for some reason, this isn't an option. 但是，由于某种原因urlretrieve无法处理请求，因此这不是一种选择。

Are there any simple-ish solutions to this (that don't include importing a library such as requests)? 是否有任何简单的解决方案（不包括导入请求之类的库）？ I've noticed that urlretrieve is part of the legacy interface posted over from Python 2, is there anything I should be using instead? 我注意到urlretrieve是从Python 2发布的旧版接口的一部分，是否应该代替我使用？

I tried creating a custom FancyURLopener class to handle retrieving files, but that caused more problems than it solved, such as creating empty files for links that 404. 我尝试创建一个自定义的FancyURLopener类来处理检索文件，但这引起的问题比解决的问题多，例如为404链接创建空文件。

Answer 1

You can subclass URLopener and set the version class variable to a different user-agent then continue using urlretrieve. 您可以将URLopener子类URLopener ，并将version类变量设置为其他用户代理，然后继续使用urlretrieve。

Or you can simply use your second method and save the response to a file only after checking that code == 200 . 或者，您可以仅使用第二种方法，仅在检查code == 200之后将响应保存到文件中。

如何在Python 3中检索带有User-Agent标头的文件？

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-08-20 11:12:30

如何在Python 3中检索带有User-Agent标头的文件？

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-08-20 11:12:30

解决方案1
0 已采纳 2015-08-20 11:12:30